Closed mmoghadas closed 8 years ago
Page 5, Figure 5 of EMCA-404 indicates that a String can contain any valid code point except " (quote), \ (reverse solidus), or a control character. I believe that most systems do not consider \t tab to be a control character
For what it's worth, http://JSONLint.com seems to agree that tabs should not appear in strings.
well, I'll say at best the document is inconsistent. I'd think this would be a possible breaking change for existing users though.
Neither spec is especially well-written, and this particular item is already prone to confusion because a tab can either be represented as an escape sequence or as the character itself. My reading is that tabs are allowed in strings only when they are represented by the char sequence "\t". However, I am not inclined to fix this since the bug seems unlikely to likely to break applications, but the fix might. Other opinions?
I'm OK with that. Maybe just place an Errata in the README?
Also, RFC 7159 page 8 specifies a string to be the same as the flowchart#5 indicates on page 5 of EMCA-404.
So I'm OK with leaving it as it is.
gah, nevermind again. I see this in RFC7159:
All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).
My reading skills need help today. Tab is U+0009, so it's defined as a control character in the spec.
ok, in light that RFC7159 is more explicit and that we already output the tabs properly (https://github.com/stleary/JSON-java/blob/master/JSONObject.java#L1435), I think I may be in favor of throwing an exception when reading JSON with invalid control characters as defined in RFC7159.
It will break backwards compatibility though.
I think that even if the tab is not allowed based on the spec, being permissive when parsing although being strict when stringifying should be the best way. In other words, I believe that accepting the tab character when converting from JSON should be accepted but when JSON is generated from a string, it should escape according to the rules. If I read it right, that's exactly how it is happening right now.
I'd go with @brunoais suggestion and just leave it as-is. Possibly make a note in the README/FAQ to indicate that reading is slightly more lax about control characters than writing. I believe a new-line (U+000A) is still not allowed for reading, but other control characters should be fine (even possibly U+0000). Perhaps also make some test cases to confirm behaviour.
Also of note: RFC7169 page 10 section 9 states that parsers must accept all valid JSON, but may also accept invalid JSON and extensions.
Good plan, I will update the README and FAQ, probably this weekend (busy last week of sprint). If anyone gets to it before me, feel free to open a pull request.
Love the activity! Thanks everyone for their feedback!
see PR #242
also, for a full test case on what control characters are allowed, see https://github.com/stleary/JSON-Java-unit-test/pull/50
in summary, all control characters are allowed except \0, \n, and \r.
Please add future comments to the pull request #242.
If I'm understanding the JSON standards correctly, tabs are not allowed. I'm posting a json to my api (the whitespace at the end of the value is a tab character). This is what's happening in the background:
String string = {"key":"value "}; JSONObject json; try { json = new JSONObject(string); } catch (JSONException e) { }
the assignment above should not succeed, but it does. you can use this sample project to see for yourself https://github.com/mmoghadas/gs-rest-service/blob/master/complete/src/main/java/hello/GreetingController.java#L26
git clone git@github.com:mmoghadas/gs-rest-service.git cd gs-rest-service/complete ./gradlew build java -jar build/libs/gs-rest-service-0.1.0.jar
http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf Insignificant whitespace is allowed before or after any token. The whitespace characters are: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.