jaydenseric / graphql-multipart-request-spec

A spec for GraphQL multipart form requests (file uploads).
993 stars 54 forks source link

Add graphql-upload-minimal to the Server implementations list #34

Closed koresar closed 4 years ago

koresar commented 4 years ago

Hello @jaydenseric

Thanks for the spec and the libraries. We've forked your top notch graphql-upload and adjusted it to our needs. Namely, we had to remove fs-capacitor because of security reasons.

Hopefully, someone else would find npm.im/graphql-upload-minimal useful.

Please, accept the PR.

mike-marcacci commented 4 years ago

Hi @koresar! Nice looking fork! I do have one quick comment about the fork (I'd open an issue in your repo, but they appear to be disabled):

Calling createReadStream() more than once does not create new streams but return same stream each time.

This is likely not the ideal behavior here. GraphQL resolvers get called in different orders and different event frames based on a wide range of factors, some deterministic, others not. Also, node streams have the property that consumed bytes are not replayed when an additional reader is attached. This means that should the same file or variable be used more than once in a query, all but the first-executed resolver could receive a truncated stream, with no way for the resolvers to detect such a scenario.

As it's impossible to predict the execution order of resolvers in an arbitrary query, we essentially have two options:

Option 1: buffer the stream's contents to allow the replay of consumed bytes

This is the option taken by the graphql-upload library. Because uploads are often large, buffering to memory is not an option, and we chose to use the operating system's temporary file storage (via fs-capacitor) to make this possible.

Option 2: disallow multiple references to the same upload or variable

This is another perfectly valid approach, which could support the majority of real-world queries, while avoiding the filesystem. This sounds much closer to the approach that you are taking in your fork. However, I would recommend that your library detect a query that reuses any variable of an Upload type and reject it up-front as invalid/unsupported. This would prevent some very, very difficult-to-debug scenarios.

Also note that the graphql-upload spec allows for file deduplication, such that the same upload populates multiple different variables. This scenario should also be detected when validating the query.


Anyhow, that's probably all for a discussion that should happen elsewhere. Feel free to hit me up if you have questions - and great work adapting this for some different use cases! (I'm also quite curious to learn what scenario requires you to avoid the filesystem.)

koresar commented 4 years ago

Hi Mike!

Thank you for the great write up!

Sorry for the Issues being disabled in the repo. I didn't notice. :) They are enabled now. Thanks!

There were two issues which lead me to forking:

I like your Option 2! Although, I don't know how to implement

library detect a query that reuses any variable of an Upload type

Don't even know where to start. But I definitely need to place a warning to the current release. And throw exception or something if createReadStream() is called more than once.

jaydenseric commented 4 years ago

Always glad to see others experimenting in this space :)

My thoughts ecco @mike-marcacci . graphql-upload originally didn't use the filesystem, and there was only one read stream per upload to use in the resolvers. We arrived where we are today with fs-capacitor after years of discovery and work.

I notice that two tests that reveal why buffering uploads to the filesystem is necessary have been disabled in graphql-upload-minimal:

  1. https://github.com/koresar/graphql-upload-minimal/blob/bf2e48806fbf409eeadcce7502b3baaf36d4f5b1/test/public/processRequest.test.js#L58
  2. https://github.com/koresar/graphql-upload-minimal/blob/bf2e48806fbf409eeadcce7502b3baaf36d4f5b1/test/public/processRequest.test.js#L174

I can see how graphql-upload-minimal could fill a niche, for people that can't use the filestem at all and are sure they will never use the same upload as an argument for multiple fields in a query. In that case, like @mike-marcacci suggested it would be good to document, handle and test those situations so users know the limitations and have decent errors if they get it wrong. Also, testing that is pretty important in case the whole server explodes with uncaught exceptions or something - it took a while to cater for and test all wrong sorts of requests that can happen in graphql-upload. In the early days certain malformed requests could crash servers. Streams are hard 😅

Another suggestion - borrow the GraphQLUpload scalar from graphql-upload, in case people are using that scalar already in published resolvers or something. The less fractured the community is regarding scalars with the same name (Upload), the better. That would negate the smaller install size benefit of graphql-upload-minimal though (I'm all about small install sizes 👍), so just an idea about what might be possible. Maybe it's not possible improving GraphQLUpload, just thinking out loud.

jaydenseric commented 4 years ago

@koresar

I definitely need to place a warning to the current release. And throw exception or something if createReadStream() is called more than once.

Maybe go back to the original graphql-upload api, where the Upload scalar promise resolved an already created stream instead of the modern createReadStream API.

jaydenseric commented 4 years ago

Hmm, but then it would be hard to warn if it is used twice. Maybe createReadStream that can throw on a second call 🤔

koresar commented 4 years ago

Yeah! Gonna throw on the second call as I told Mike.

throw exception or something if createReadStream() is called more than once.

Thanks for all the suggestions. I definitely need to make some improvements.

koresar commented 4 years ago

Done.

Thank you for the recommendations.