jaydenseric / graphql-multipart-request-spec

A spec for GraphQL multipart form requests (file uploads).
993 stars 54 forks source link

README graphic is *somewhat* misleading regarding buffering #68

Closed Sytten closed 10 months ago

Sytten commented 1 year ago

Hello there! I was reading the various implementations to see how people where dealing the streaming of files without writing to temp disk first. In the rust implementation, we always buffer on disk or memory first and it looks like the other implementations also do that since multipart is sequential in nature so by default if you need the second file you need to buffer the first file somewhere.

So I feel like it the graphic presented is somewhat misleading, the stream as in the data from the client is never really passed to the resolver. Some implementations might have an optimization for a single file upload, but that seems like the exception to the general rule. Unless I am mistaken even the JS implementation uses fs-capacitor to buffer on disk. I think it should be made clearer that this is a limitation of multipart, you just can't get concurrent streams of files.

That being said, I am wondering if there should be a spec for single file upload where we could get the stream of data without buffer into the resolver?

jaydenseric commented 1 year ago

Sync vs async GraphQL multipart request middleware

It's a bit tricky to communicate the nuances in a single graphic, but the general idea of the graphic is that there tends to be two distinct approaches for file upload middleware:

  1. The middleware awaits all the files to have finished uploading, and it stores them all somewhere. Once that is done, the middleware hands-off to the next middleware metadata about all the received files (typically a list of filesystem paths to where they are all stored). This is the "sync" scenario in the graphic. This is how a lot of the Express and Koa middleware (not specific to GraphQL requests) for processing multipart request file uploads available on npm works.
  2. The middleware passes a promise for each file upload stream on to the next middleware. How that is achieved under the hood varies; graphql-upload does in fact buffer each of the streams to the filesystem using fs-capacitor, but that's an implementation detail that doesn't render the timings in the graphic as inaccurate. There is the alternative package graphql-upload-minimal that achieves a similar API, but it doesn't buffer the file upload streams to disk; it really works as you would intuit from the graphic. The raw file upload streams are passed onto the next middleware as promises. The downside with that naive approach is that you can't have same file referenced twice in a GraphQL query/mutation, because the resolvers for each place it's used would compete to consume the same readable stream.

I think it should be made clearer that this is a limitation of multipart, you just can't get concurrent streams of files.

The graphic doesn't show the file streams happening in parallel though; they are received one after another in the timeline. The main difference between the “sync” and “async” scenarios is when the resolvers run.

Do you have a specific suggestion for rewording a particular part of the graphic?

The main purpose of that graphic is to get people to consider some of the technical requirements for achieving the “async” scenario, that justify the spec mandated fields like map.

jaydenseric commented 10 months ago

Closing as per the answer above, https://github.com/jaydenseric/graphql-multipart-request-spec/issues/68#issuecomment-1694798413 .