jaydenseric / graphql-multipart-request-spec

A spec for GraphQL multipart form requests (file uploads).
1.01k stars 56 forks source link

Null means the opposite of Null #4

Closed benwilson512 closed 6 years ago

benwilson512 commented 6 years ago

Hey folks,

I'm very happy to see a standardized spec surrounding file uploads, particularly one that lets you treat them as arguments. We've been doing something like this for a while with Absinthe https://hexdocs.pm/absinthe_plug/Absinthe.Plug.Types.html#content but would be happy to change to a more standardized format.

My trouble with this particular setup though is the use of null as a placeholder. This issue is both conceptual and technical. From a conceptual perspective we're using null to mean that something is there, which is the precise opposite of what null communicates. From a technical perspective, I'm not entirely sure how this would work with arguments that are marked non null. It seems counterintuitive to permit null to be a valid argument to a non null field.

Alternatives: If we aren't trying to use the values to mean much of anything then simply an empty string would suffice, and help avoid the problems I've outlined. In the Absinthe implementation the strings point to the HTTP part that contains the file they want, but the mapping solution you have already handles that sort of issue.

Thoughts?

jeroenvisser101 commented 6 years ago

I had the same feeling when I was looking at the spec and trying to implement it. I did notice it at first but went ahead on implementing it. I quickly got stuck because of the implementation of fields marked non null that would be handled at a higher layer within Absinthe, which is fully compliant with the GraphQL spec and performant, since it returns an error fast.

I tried working around it when @benwilson512 brought up the point that using null to mean something completely different as 'no value'.

The suggesting of sending empty strings would still not make much sense, and if we're changing this I'd much rather suggest on putting the file's key as string as a value of the variable. I know that requires significant changes on both the client and server.

PS. Using the file keys as strings as value will also make handling batched operations much easier, since you don't have to know what the number of the operation is, just the file key, which I think is a nice bonus.

jeroenvisser101 commented 6 years ago

As I was looking into your current implementations to see if there was a possibility of doing file keys as the value, it seems like the JavaScript server implementation is a little more limited, not allowing to access the context of the request (server implementation, JS API). The map seems to be a necessity. Can we change the spec to send the file keys both in the map and in the variables itself?

benwilson512 commented 6 years ago

Absinthe is sufficiently flexible that you could abuse internals and add a pass that would turn the nulls into files prior to the non null validation or something of that sort, it just seems like an undesirable thing to do.

The suggesting of sending empty strings would still not make much sense

Well, empty strings are a more sensible placeholder than null. I'm not necessarily saying that that's my favorite solution, I also like having the strings actually mean something, I'm simply articulating what I think is the minimal amount of change required to at least solve the null issue.

jeroenvisser101 commented 6 years ago

Agreed and agreed, that would be the minimum change to make it work without workarounds.

I would however prefer a change in which they are sent in both the map and the variables, so whichever the server can read it will use. What do you think about that?

edit: it does seem a bit silly to send it twice, but I feel it makes more sense than null or an empty string

jaydenseric commented 6 years ago

Glad to see more implementations popping up!

null is intended to mean defined under a name, but not yet populated with anything. In my opinion it's a perfect semantic fit. At the time the operation JSON is received by the API server, no files have been uploaded yet. Their values at that time are in fact null. They may never be received if the request is malformed or the multipart stream fails.

React Apollo uses null to represent entities being fetched.

You could make a client implementation that uses "foobar" or anything else that can be JSON serialized as file placeholders, but it would be a waste of bandwidth. Server implementations such as apollo-upload-server will never read the values; the only time they are interacted with is to be overwritten by the Upload scalar promise.

In false starts I was fully deleting the properties so they don't appear as a key or a value in the operation object being transported, since we have enough information in the file object paths to rebuild them later anyway and less bytes down the wire is nice. I ran into problems with arrays though, because if you have files in a number of the positions extractions change the indexes and it's not certain the array items will be in the same order at the other end. That is why I found it conceptually more robust to work with null placeholders.

It makes the request fields more intuitive to inspect too, since whole variables don't appear to be "missing". Although the spec should be about technical elegance and performance than readability, in the same way you are not meant to be able to understand a gzip payload in transport.

I am up to something like my fourth client implementation now, and have found it particularly simple to be able to just delete the files in the extract recursion. JSON serialization automatically converts undefined or empty values to null.

I have tried and failed to understand the implementation difficulties being discussed here. I don't know Absinthe, but conceptually everything happening with client and server transport implementations is outside the realm of GraphQL. GraphQL systems should never see the null file values; they will be promises by the time the operation reaches any resolvers.

benwilson512 commented 6 years ago

Hey @jaydenseric, thanks for the reply.

I'm not entirely sure that the mechanics of how multipart works on certain platforms are relevant here, nor are conventions about whether null is a placeholder in your language of choice. I'm looking at the semantics of the GraphQL document itself:

For reference, the spec says about the following example:

{
  field(arg: null)
  field
}

The first has explictly provided {null} to the argument "arg", while the second has implicitly not provided a value to the argument "arg". These two forms may be interpreted differently. For example, a mutation representing deleting a field vs not altering a field, respectively. Neither form may be used for an input expecting a Non-Null type.

So if we have a file upload mutation where the file argument is no null then the following document is ipso facto invalid.

mutation {
  uploadFile(file: null) {
    id
  }
}

It puts the server in a position of re-writing a presently invalid document into a valid one. This should be avoided. You can take the position that this is purely a transport concern, but if we go that route then we're no longer transporting GraphQL, we're transporting a dialect thereof.

GraphQL systems should never see the null file values; they will be promises by the time the operation reaches any resolvers.

The user of the GraphQL system may well never see null values, but presumably the GraphQL system itself does, and then has to go through the process of turning them into not null values prior to validation. I'm not sure how you'd do this without involving the GraphQL system because at a minimum you'd need to parse the query document itself, and likely need to lookup what args are File types so that you don't overwrite null for other argument types. As a rule, I don't think GraphQL systems should be in the business of turning invalid, non spec compliant documents into valid ones.

but it would be a waste of bandwidth.

This feels a bit nit picky given that the number of bytes consumed by "null" * N files is generally going to be but a fraction of file_size(N files), but in any case such a concern would not be true of using an empty string. More to the point, a thing is a waste if it provides no value, or if the cost greatly exceeds the value. I've tried to argue here that there's value to be had in a spec compliant approach, so even if we went with something like "file.0" the extra handful of bytes would not in my view constitute waste.

Given that it seems like we can be fully spec compliant by the use of an empty string and a custom scalar, I'm unclear on what advantages using null bestows.

jaydenseric commented 6 years ago

we're no longer transporting GraphQL, we're transporting a dialect thereof

Transport is not part of GraphQL, what we are doing is not even a dialect of it. An Apollo operation object is something made up for Apollo server and client to use to talk between each other. A lot of GraphQL servers happen to have adopted a similar shape for POST requests, but it is not part of the GraphQL specs. With persisted queries you don't even send a query. Batching is a made-up thing in Apollo, where an array can be sent. Until now, there has been no convention or spec for Multipart GraphQL server POST requests at all.

The user of the GraphQL system may well never see null values, but presumably the GraphQL system itself does

No it doesn't, if you don't count our upload implementations as part of the "GraphQL system". Our upload middleware converts the multipart request back into something the GraphQL server can later digest, like this. I have had in-person discussions with the Apollo team about building these smarts directly into the official GraphQL server. There has also been a desire to settle on a spec so that Apollo Engine can be taught to work with multipart requests.

GraphQL systems should [*not?] be in the business of turning invalid, non spec compliant documents into valid ones

An Apollo operations object, where the null values being discussed are, is not a "spec compliant document" and nor is it invalid for some properties in the object variables to have null values. The document you must be referring to is the string contents of the query property? We don't touch that anyway.

Given that it seems like we can be fully spec compliant by the use of an empty string and a custom scalar, I'm unclear on what advantages using null bestows.

Please link to which part of which spec you are referring to. I'm unclear how an empty string would represent a binary file that has not uploaded yet any better than null. I can't see value in pasting arbitrary text.

Feel free keep the discussion going, I'll reopen if something concrete comes up 🙂