graphql / graphql-spec

GraphQL is a query language and execution engine tied to any backend service.
https://spec.graphql.org
14.28k stars 1.12k forks source link

JSON Key Order #168

Closed benwilson512 closed 8 years ago

benwilson512 commented 8 years ago

Hey folks,

Per https://github.com/facebook/graphql/commit/d4b4e67f083ff6699a8f5fac5e069faba89c5bbb it seems to imply that GraphQL implementations must guarantee a particular key order in JSON responses sent back from GraphQL servers.

I'm struggling to figure out how this isn't plainly at odds with the JSON spec which states: http://www.json.org/

An object is an unordered set of name/value pairs. 
stubailo commented 8 years ago

Yeah, it makes total sense to me to introduce alternate transports to improve performance, but it would be great of one of the transports were "plain JSON" without any additional requirements for the client or server.

OlegIlyenko commented 8 years ago

As far as I see it, GraphQL is agnostic of transport and data format. So it's ok to define this kind of requirement in GraphQL spec, in my opinion. Whether some particular data format (like JSON, MessagePack, etc.) supports it or not is a different question. As far as I understand the spec, it just asks for a "best effort" in this respect. Formats that have field ordering guarantees (like MessagePack) must follow an order of the fields in the GraphQL query. Formats that do not have these guarantees (like JSON) may follow this rule, if particular implementation allows it, but not obliged to do it. Recently we had a discussion about this in sangria gitter chat as well.

I added support for field ordering in sangria (scala impl) as the spec change was introduced and there was at least one JSON library that does not support fields ordering, so people who are using it will not get these guarantees. On the other hand, there are plenty of other scala JSON libraries that do support field order (even though JSON spect does not require it).

stubailo commented 8 years ago

Yeah I guess the worrying part is in the commit message:

This enables custom parsers which expect JSON values to be a very specific shape for performance reasons,

Because it specifically refers to JSON.

I think the issue here is that rather than introducing an option to have ordered JSON as one of the transports, the spec change seems to require it for all GraphQL endpoints that return JSON in particular.

benwilson512 commented 8 years ago

Right yeah. I would be perfectly happy if the stance was "do it if possible" but the RFC specifically mentions JSON:

+Response formats which support ordered maps (such as JSON) must maintain this
 +ordering.

https://github.com/facebook/graphql/commit/d4b4e67f083ff6699a8f5fac5e069faba89c5bbb#diff-86a7cb172ae1d944ca3719bd848b1d8bR353

I'm trying to figure out if this was just an error / oversight or if they're intentionally trying to override the JSON spec.

OlegIlyenko commented 8 years ago

Indeed, the spec definition is a bit confusing: JSON itself does not have support for ordered maps, so it can't really serve as an example in this case. I guess I interpreting it in a way that complies with the JSON spec and the current state of the JSON libraries.

leebyron commented 8 years ago

Good feedback (sorry for delayed response). I'll look into clarifying the language in the spec.

To fill in some context and detail this isn't an error or oversight, it's an intentional override of the JSON spec. I think the JSON spec is worded in a way that's not quite correct.

An object is an unordered set of name/value pairs.

This is untrue.

A better wording would have been something like "An object is a set of name/value pairs which might be unordered".

What's actually trying to be represented here (AFAIK) is that JSON is designed to be portable to many languages, not just JavaScript, many of which don't have ordered maps, or don't want to assume the overhead of enforcing order. If JSON spec claimed that objects are ordered kv pairs, then it would enforce that parsing JSON would always result in ordered maps, something which can't always be done by every language. However in my opinion if not "ordered" doesn't have to be "unordered" it could be "unknown if ordered or unordered" - parsing a JSON object as an ordered list of kv pairs into an ordered map should also be totally legal. In other words, JSON objects do not specify if they are ordered or unordered, thus it is unsafe to assume they are ordered but safe to assume they are unordered.

From a technical point of view, JSON objects are definitely ordered! JSON is defined as a string-serialization format. Strings are arrays of characters. Arrays are ordered. Therefore JSON is ordered. When you parse a JSON-serialized object, you are absolutely certain to get first in your parser the KV pair that appears first in that JSON string. This is an important, unfortunately unspecified, but often relied on property of JSON. If your language's maps are ordered (cough JavaScript) then you would expect parse(stringify(orderedObj)) to have the KV pairs in the same order.

This second reason is why GraphQL requires maintaining ordering if the serialization format allows for it. If a client can guarantee not just the fields that are going to be returned by the server but also the order they'll appear in then a client can employ a much more efficient response parsing strategy. From a developer experience point of view, it's also much easier to understand and debug query responses when the order of fields mirrors the request.

That's a lot of nuance to cram into a spec. Does that answer questions? Am I off my rocker? I'm interested in feedback so I can clarify the spec in the best way possible.

stubailo commented 8 years ago

@leebyron just one slightly off-topic question - how would one employ a more efficient parsing strategy that takes advantage of this property?

leebyron commented 8 years ago

There are two examples that seem to come up.

One is if your parsing strategy depends on the type of object that returned. In that case you're often querying for __typename first, and if that can guarantee that __typename is the first field to come back in the response, then you can use that to readValue() from your JSON lexer and then use the result as the input to a switch statement. Without the guarantee of ordering, you would first have to parse the whole JS object, buffering it somewhere, and then look to see if any key was __typename before continuing into your switch.

The second is in building natively typed objects or structs as the outcome of your parser. You might have a parser that lexes a bunch of values from the JS object into local stack variables and then returning a struct from them. Having some guarantee about their ordering can help avoid buffering them all into a Dictionary/Map type intermediate structure first.

I've seen this technique employed in iOS / ObjectiveC before to still leverage JSON but avoid having a typical two-pass parse where the first pass converts a JSON string to nested NSDictionary / NSArray and the second pass turns those NS structures into strongly typed ObjC objects. If you know something about the shape the JSON will return in, then you can avoid the NSDictionary building first which can save memory and time in aggregate.

bruce commented 8 years ago

Based on the above "JSON is defined as a string-serialization format. Strings are arrays of characters. Arrays are ordered" and "GraphQL requires maintaining ordering if the serialization format allows for it," are we to assume GraphQL requires every ordering for every "string-serialization" format? How about binary formats -- aren't their bits ordered? With this in mind, what format does GraphQL foresee not requiring ordering for?

leebyron commented 8 years ago

With this in mind, what format does GraphQL foresee not requiring ordering for?

Great question! Partly I'm not sure, but also partly I think I may have missed a detail in the spec. It could be the case that while JSON is fully capable of representing an ordered map that the underlying programming language environment is not, or at least not without creating undue burden. I can't think of an example off the top of my head, but the possibility exists.

Another situation may be if the way a serialization format represents a map is via a data structure where truly KV ordering is not possible. Yes all computer memory is just arrays of bytes and thus everything is inherently ordered, but sometimes that ordering is used to encode something else. Consider a hypothetical serialization format which encodes KV maps as Tries. Then it would be a strict requirement of that encoding that the map could only be ordered via the internal mechanics of the Trie implementation, and could not be capable of encoding any other arbitrary ordering.

leebyron commented 8 years ago

N.B. JavaScript objects were for the longest time un-specified as to their ordering (which is probably where JSON derived this unordered-ness) however every JS engine implemented property ordering the same way and many programs relied on this ordering. Rather than leaving them as unordered and risking a browser "breaking the web" while technically adhering to the spec, the ES6 edition decided to just go ahead and describe how JS object properties are ordered (http://www.ecma-international.org/ecma-262/6.0/#sec-ordinary-object-internal-methods-and-internal-slots-enumerate & http://www.ecma-international.org/ecma-262/6.0/#sec-ordinary-object-internal-methods-and-internal-slots-ownpropertykeys)

bruce commented 8 years ago

Keep in mind I don't have any problem with ordering -- I just think, as others have mentioned, that calling it out specifically for JSON just sounds incorrect -- I don't think most people would agree that JSON spec is "not quite correct" in calling objects unordered, as so many languages (perhaps the vast majority) consume and generate to/from JSON in an unordered manner. While I think it would be fine for the spec to state ordering should be maintained whenever possible, forcing people to build ordered JSON serializers (specifically) to support what amounts to a very unique GraphQL interpretation of the JSON spec doesn't seem right.

bruce commented 8 years ago

Also, there's a big difference between implicit ordering in a (single language) datastructure and ordering in a data interchange format -- I'm happy the JS implementations agreed and -- notably -- the spec was updated to reflect that, but JSON has a much wider user base.

bruce commented 8 years ago

One quote from ECMA-404: "Programming languages vary widely on whether they support objects, and if so, what characteristics and constraints the objects offer. The models of object systems can be wildly divergent and are continuing to evolve. JSON instead provides a simple notation for expressing collections of name/value pairs. Most programming languages will have some feature for representing such collections, which can go by names like record, struct, dict, map, hash, or object." (This is why I think forcing ordering on JSON is a more complex issue than was the case with JS objects.)

leebyron commented 8 years ago

Thanks, I agree that it definitely sounds weird without any additional clarification. I certainly want to amend the spec to make this more clear.

It's not necessarily that many languages consume and generate JSON in an unordered manner, it's that they're consuming from and generating to data structures which are themselves unordered. I would wager that the JSON generation and consumption themselves are in fact ordered, in every implementation you could find, just because of the nature of parsers and generators being inherently linear processes.

I think a part of what's missing from the spec is this little bit of detail about the backing data structures. It's not just the serialization format that's at play here, it's also the backing data structures in the host language. For example the python implementation of GraphQL first used unordered maps to represent the field responses (before this field ordering language was added to the spec) and it caused confusion. So Syrus changed to using ordered maps and that was a motivation to add the language to the spec. Of course he didn't touch the JSON serializer in python, just the underlying data structure.

leebyron commented 8 years ago

This is why I think forcing ordering on JSON is a more complex issue than was the case with JS objects

Yes definitely. also luckily for us GraphQL limits a little bit of the scope described by that phrasing in ECMA-404. GraphQL doesn't have a KV-Map type, so serialized JSON objects always correspond to a "struct" or "record" type of abstraction. It's still true that the languages on both sides, server and client, may have differing characteristics of the ordering of the properties of structs/records, but GraphQL itself does have ordering in the fields requested in a selection set, which is why we can ask for the requirement if the host server is capable of fulfilling it.

bruce commented 8 years ago

That's why this should say something like "implementations should attempt to maintain ordering" instead of calling out JSON specifically as ordered. Sure, it's great the Python implementation did that -- I agree ordering is a nice feature -- and we may certainly do the same from Elixir (by definition, to support ordered serialization formats, the core implementation needs to support ordering). It's largely an issue of language -- this should be a MAY for JSON -- not a SHOULD, and most certainly not a MUST.

leebyron commented 8 years ago

Great feedback!

benwilson512 commented 8 years ago

Nothing about a string of bytes makes it intrinsically one thing or another. Data structures only "exist" in the sense that we write down rules about how we're going to treat those bytes, and write programs that interact with those bytes in that way. Concretely, the JSON spec is the definition of what JSON is, not the bytes that make up an example of JSON.

There's always a plethora of ways to interpret something. [1,2,3] is not equivalent to [2,1,3] if we treat them as lists, but if we treat them as sets they are. If we treat them as text in a github comment they're in different places so they're obviously not the same. It's vital that we agree on the definitions of what things are so that we can interact with them in a consistent, predictable, and reliable way.

The folks who developed the JSON spec have done just that, and it's concerning that we aren't giving due regard for what JSON says about itself. We don't want users of GraphQL making up extra rules about how to interpret a GraphQL document and we shouldn't do the same with JSON.

leebyron commented 8 years ago

The folks who developed the JSON spec have done just that, and it's concerning that we aren't giving due regard for what JSON says about itself. We don't want users of GraphQL making up extra rules about how to interpret a GraphQL document and we shouldn't do the same with JSON.

To be clear here, I don't want to disregard what JSON is capable of representing. I know this discussion is getting fine-grained about definitions of things. I'm trying here to make a distinction between what JSON semantically represents vs the JSON grammar. When I say that JSON is a sequence of characters, I'm referring to the JSON Grammar. Grammars are definitionally linear. [1,2,3] and [3,2,1] are grammatically different. JSON semantics are still what they are, unchallenged. I agree with all your points that if you want to treat [] as a Set vs as an Array that the semantic understanding changes how to interpret the grammar. JSON made it clear that a {} grammar should imply nothing about order. Above I was trying to point out that many rely on the un-specified behavior of JSON objects being parsed in order (as the very nature of parsers operating linearly) and the GraphQL spec wants to have an opinion on how the JSON grammar is generated (not on how it is interpreted by a client)

leebyron commented 8 years ago

Those interested - I attached a PR to change some of the spec language and I'm interested in your feedback on it.

leebyron commented 8 years ago

Perhaps a good parallel here is JavaScript's spec for JSON.stringify (http://www.ecma-international.org/ecma-262/6.0/#sec-json.stringify) - while JSON objects are specified as representing unordered collection, JavaScript still specifies the generation and parsing of JSON in an ordered manner. The fact that the properties of a and b in a = JSON.parse(JSON.stringify(b)) appear in the same order is actually guaranteed by the JS spec, despite JSON itself not having an semantic ordering