stleary / JSON-java

A reference implementation of a JSON package in Java.
http://stleary.github.io/JSON-java/index.html
Other
4.53k stars 2.56k forks source link

JSONObject: tabs should not be allowed #240

Closed mmoghadas closed 8 years ago

mmoghadas commented 8 years ago

If I'm understanding the JSON standards correctly, tabs are not allowed. I'm posting a json to my api (the whitespace at the end of the value is a tab character). This is what's happening in the background:

String string = {"key":"value "}; JSONObject json; try { json = new JSONObject(string); } catch (JSONException e) { }

the assignment above should not succeed, but it does. you can use this sample project to see for yourself https://github.com/mmoghadas/gs-rest-service/blob/master/complete/src/main/java/hello/GreetingController.java#L26

git clone git@github.com:mmoghadas/gs-rest-service.git cd gs-rest-service/complete ./gradlew build java -jar build/libs/gs-rest-service-0.1.0.jar

http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf Insignificant whitespace is allowed before or after any token. The whitespace characters are: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.

johnjaylward commented 8 years ago

Page 5, Figure 5 of EMCA-404 indicates that a String can contain any valid code point except " (quote), \ (reverse solidus), or a control character. I believe that most systems do not consider \t tab to be a control character

stleary commented 8 years ago

For what it's worth, http://JSONLint.com seems to agree that tabs should not appear in strings.

johnjaylward commented 8 years ago

well, I'll say at best the document is inconsistent. I'd think this would be a possible breaking change for existing users though.

stleary commented 8 years ago

Neither spec is especially well-written, and this particular item is already prone to confusion because a tab can either be represented as an escape sequence or as the character itself. My reading is that tabs are allowed in strings only when they are represented by the char sequence "\t". However, I am not inclined to fix this since the bug seems unlikely to likely to break applications, but the fix might. Other opinions?

johnjaylward commented 8 years ago

I'm OK with that. Maybe just place an Errata in the README?

johnjaylward commented 8 years ago

Also, RFC 7159 page 8 specifies a string to be the same as the flowchart#5 indicates on page 5 of EMCA-404.

So I'm OK with leaving it as it is.

johnjaylward commented 8 years ago

gah, nevermind again. I see this in RFC7159:

All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

My reading skills need help today. Tab is U+0009, so it's defined as a control character in the spec.

johnjaylward commented 8 years ago

ok, in light that RFC7159 is more explicit and that we already output the tabs properly (https://github.com/stleary/JSON-java/blob/master/JSONObject.java#L1435), I think I may be in favor of throwing an exception when reading JSON with invalid control characters as defined in RFC7159.

It will break backwards compatibility though.

brunoais commented 8 years ago

I think that even if the tab is not allowed based on the spec, being permissive when parsing although being strict when stringifying should be the best way. In other words, I believe that accepting the tab character when converting from JSON should be accepted but when JSON is generated from a string, it should escape according to the rules. If I read it right, that's exactly how it is happening right now.

johnjaylward commented 8 years ago

I'd go with @brunoais suggestion and just leave it as-is. Possibly make a note in the README/FAQ to indicate that reading is slightly more lax about control characters than writing. I believe a new-line (U+000A) is still not allowed for reading, but other control characters should be fine (even possibly U+0000). Perhaps also make some test cases to confirm behaviour.

Also of note: RFC7169 page 10 section 9 states that parsers must accept all valid JSON, but may also accept invalid JSON and extensions.

stleary commented 8 years ago

Good plan, I will update the README and FAQ, probably this weekend (busy last week of sprint). If anyone gets to it before me, feel free to open a pull request.

mmoghadas commented 8 years ago

Love the activity! Thanks everyone for their feedback!

johnjaylward commented 8 years ago

see PR #242

johnjaylward commented 8 years ago

also, for a full test case on what control characters are allowed, see https://github.com/stleary/JSON-Java-unit-test/pull/50

in summary, all control characters are allowed except \0, \n, and \r.

stleary commented 8 years ago

Please add future comments to the pull request #242.