jaydenseric / graphql-upload

Middleware and an Upload scalar to add support for GraphQL multipart requests (file uploads via queries and mutations) to various Node.js GraphQL servers.
https://npm.im/graphql-upload
MIT License
1.43k stars 131 forks source link

Occasional empty / zero byte streams at high load #318

Closed captainjapeng closed 2 years ago

captainjapeng commented 2 years ago

Hi!

First of all, thank you for this great library! I've been using it for a couple of years now and it really makes the GraphQL developer experience better 🙂.

Unfortunately, we're seeing occasional errors where the createReadStream() would have zero bytes on our production environment with very little load running on GKE even though the whole multipart request was received (I've added a PassThrough stream to print out the whole request when the stream length is zero).

I have been able to replicate this on a local K8S when load testing the service at 40-60 rps with a chance of around 1 in 5000. Basically, the service in question here is a GraphQL Gateway, we're using graphql-upload v13.0.0's processRequest as a middleware for fastify to process the body and then passing it to apollo-server-fastify and @apollo/gateway together with apollo-federation-file-upload as the datasource to replay the file upload.

My initial assumption was that it has something to do with the networking/parsing layer but I had run a separate test using the received request body directly to dicer in a >100,000 loop to see if it was the cause but the result didn't have any errors parsing the multipart.

I have a couple of theories as to why this happens.

  1. Upon diving with fs-capacitor I have learned that all file uploads create a temporary file so maybe during high load it fails to create a temporary possibly due to too many files being open? I'm currently thinking if this is related to this issue
  2. During debugging, whenever I encounter the zero byte stream the onData and onEnd was never called, I'm not super experienced when it comes to streams but would it be possible for createReadStream() to execute before the data is written to the FileStream?
  3. It may also be the combination of the two? During high load disk writes can cause high latency which affects reading of the file when calling createReadStream.

Our solution right now is to migrate to graphql-upload-minimal since we're only using the stream to pass through to the receiving backend service. I haven't encountered the issue so far with this setup even at 120 rps load.

jaydenseric commented 2 years ago

Thanks for the detailed issue.

Normally in a situation like this I would cast suspicion on all the Apollo and fastify stuff influencing the streams, but it sounds like your investigation has moved passed that.

It would be interesting to see if you have the same problem after upgrading to graphql-upload v15+, as that uses busboy v1 which has a fresh, more modern approach to the streams.

You can see an explanation about when our fs-capacitor dependency will be updated here: https://github.com/jaydenseric/graphql-upload/pull/290#pullrequestreview-981670520 .

jaydenseric commented 2 years ago

One thing you could do as an experiment is copy-paste the new fs-capacitor code into your node_modules/fs-capacitor/ from here, and manually change the newer ESM to CJS so it will work:

https://unpkg.com/browse/fs-capacitor@7.0.1/dist/index.js

captainjapeng commented 2 years ago

I will try to conduct the experiment on a much simpler project with fewer dependencies.

captainjapeng commented 2 years ago

I was able to conduct the experiment both with graphql-upload@13.0.0 which my current setup uses and with the latest graphql-upload@15.0.1 with patched fs-capacitor@7.0.1.

Both were experiencing the same zero-length buffer issue but is more frequent on the newer version, I've attached some screenshots.

v13.0.0

image

v15.0.0 with patched fs-capacitor

image

Here's the repository of the experiment https://github.com/captainjapeng/graphql-upload-zero-buffer

captainjapeng commented 2 years ago

After adding a breakpoint whenever the buffer was zero, it seems to appear that the tmp file of fs-capacitor is successfully written

image

jaydenseric commented 2 years ago

I don't have much to add except to say, great job @captainjapeng! Thanks for your diligent work figuring this out.

We have a conundrum because if the fix (i.e. https://github.com/mike-marcacci/fs-capacitor/pull/76) is applied to fs-capacitor v7, which is pure ESM, then graphql-upload which is currently still CJS won't get it as we are still dependant on v6. Either we can update graphql-upload to pure ESM and update the fs-capacitor dependency to v7 earlier than planned (see https://github.com/jaydenseric/graphql-upload/pull/290#pullrequestreview-981670520), or, maybe we could update to fs-capacitor v7 and keep graphql-upload CJS for a bit longer by using a dynamic import to get fs-capacitor when it's needed within the async processRequest function. Some people bundle their server projects, and their bundling might not account correctly for the dynamic import so for those guys it might be considered a major change.

captainjapeng commented 2 years ago

I understand, our solution with graphql-upload-minimal seems to be working fine and might be the better route for a gateway service. In the meantime, I've attached a patch file to be used with patch-package if there is a need for an immediate fix for other people.

patches/fs-capacitor+6.2.0.patch

diff --git a/node_modules/fs-capacitor/dist/index.js b/node_modules/fs-capacitor/dist/index.js
index 91c6d96..7b4e772 100644
--- a/node_modules/fs-capacitor/dist/index.js
+++ b/node_modules/fs-capacitor/dist/index.js
@@ -48,7 +48,12 @@ class ReadStream extends stream_1.Readable {
             // If there were no more bytes to read and the write stream is finished,
             // than this stream has reached the end.
             if (this._writeStream._writableState.finished) {
-                this.push(null);
+                // Check if we have consumed the whole file up to where
+                // the write stream has written before ending the stream
+                if (this._pos < this._writeStream._pos)
+                    this._read(n);
+                else
+                    this.push(null);
                 return;
             }
             // Otherwise, wait for the write stream to add more data or finish.
mike-marcacci commented 2 years ago

Hi folks – @captainjapeng thanks for the diligent research on this.

@jaydenseric I'm happy to entertain dual publishing fs-capacitor with both cjs and mjs if that's needed here. I don't want to hijack this issue, but wanted to let you know that's an option (despite it rubbing my OCD the wrong way). I do suppose that doing a dynamic import() might be a reasonable workaround, though, given how small that extra processor tick is compared to the rest of the operation.

jaydenseric commented 2 years ago

It seems the idea to dynamic import pure ESM fs-capacitor is not going to work due a TypeScript issue: https://github.com/microsoft/TypeScript/issues/49055#issuecomment-1151747145

Side note about the dynamic import approach; I had an idea about doing the dynamic import on the first function call, and storing the result in a let outside of the function scope so it can be used from then on instead of awaiting a promise for the dynamic import again and again each function call. But I couldn't find any information about if Node.js has optimisations for multiple dynamic import calls of the same thing (are calls after the first faster?) or if awaiting a promise really saves that much time or system resources.

Thanks for offering to add a CJS entry point to fs-capacitor, but maybe I should just move graphql-upload to pure ESM and be done with it.