Closed dylanahsmith closed 2 years ago
I also was reading the spec and realised this. Given the paragraph around it:
GraphQL documents are expressed as a sequence of Unicode characters. However, with few exceptions, most of GraphQL is expressed only in the original non‐control ASCII range so as to be as widely compatible with as many existing tools, languages, and serialization formats as possible and avoid display issues in text editors and source control.
It sounds to me an error of ignorance rather than intent.
Rust formerly had \uXXXX
and \UXXXXXXXX
, but changed to \u{x}
(the same as Swift does) some time before Rust 1.0.
JSON (which is probably the main inspiration for the GraphQL syntax) does \uXXXX
and uses the abomination that is UTF-16 surrogate pairs as a way of representing higher-order characters, e.g. U+1F600 (😀) is escaped as "\ud83d\ude00"
in JSON.
Fortunately you can avoid that insanity by simply expressing values literally. There’s no real need for the escapes anyway once you get past U+001F (\u00XX
) and U+0022 (\"
). (Unless you deal with combining characters that will attach to a string’s quotation marks, which is fearfully ugly and points out the grammatical problem of parsing by codepoint rather than grapheme cluster, but this is all more advanced stuff that we wish wouldn’t happen in real life, anyway.)
Also currently the escapes listed (EscapedCharacter
) match those of JSON. (I think. As to the interpretation, what the GraphQL spec actually says is that \f
would be U+0066, “f”, rather than U+000C which is what we all know it’s supposed to be. It’s really badly written.) Given that general tie, supporting \uXXXX
might not be a terrible idea, with or without \u{X}
.
The definition of the handling of EscapedUnicode
is also extremely tacky, with spelling errors, poorly defined terms, &c.:
Return the character value represented by the UTF16 hexidecimal identifier EscapedUnicode.
What does that even mean? Seriously, that doesn’t make sense.
This stuff all suggests to me that it was written by someone with a poor understanding of Unicode. This spec gravely needs both editorial and technical review.
I want to see how different implementations parse:
"\ud83d\ude00"
: nonsensical in the current specification. If GraphQL wants to be like JSON, handling it as UTF-16 surrogate pairs is probably a good idea. If not (please don’t go for surrogate pairs!), the grammar needs to be changed to allow for the supplemental planes (such as via \u{1F600}
)."😀"
: illegal in the current specification, shouldn’t tokenise. However, I hope that implementations accept it and treat it as a string containing the code point U+1F600.Thanks for bringing this up! Great thought process already happening.
I agree that surrogate pairs is an obtuse API. I'd like to avoid it if possible, though there is one serious upside to consider: it mirrors JSON. That might not be enough to motivate it as the solution, but it certainly shouldn't be discredited.
Here are some action items:
@dylanahsmith and @chris-morgan I'd love your feedback on #231
:+1: for this spec' !
To be able to build an enum like
enum MOOD {
😩
😞
😕
😐
🙂
😃
}
@Nabellaleen Allowing emoji in an identifier is a completely different thing from allowing it in a source document, which is mostly for the sake of strings. And allowing emoji in identifiers is generally a poor idea; most languages stick with UAX #31’s definition for identifiers.
I would be interested to revive this discussion: I don't see a reason for restricting it and we already see implementations and APIs having descriptions with emojis in it (e.g. github API).
fyi: graphql-ruby supports all unicode chars (cc @rmosolgo) and we decided to do the same for GraphQL Java.
Build fail if you have emoji in the path.
I created a new issue which outlines proposed changes to the spec to allow for full unicode support: #687
Currently a GraphQL document is only allows a SourceCharacter ::
/[\u0009\u000A\u000D\u0020-\uFFFF]/
and EscapedUnicode ::/[0-9A-Fa-f]{4}/
also prevents unicode characters above U+FFFF from being included into a GraphQL string.Unicode code points are actually in the range 0 to 0x10FFFF. For example, unicode emoji characters like 😀 (U+1F600) have code points above U+FFFF.
Is there any reason why the source document doesn't allow unicode characters above U+FFFF? Or can we remove that restriction? Without that restriction the limitation of the unicode escape doesn't seem problematic.
If supporting a unicode escape for all unicode characters is desired, then one way of handling that is the way swift supports unicode escapes: