Open GoogleCodeExporter opened 9 years ago
[deleted comment]
Hi, I found a doc which talks about this deeper, in the first paragraph, it
says " quotation mark, reverse solidus, and the control characters (U+0000
through +001F)." is the character must be escaped (such as append a '\' in
front of it). So I think the graph representation in www.json.org isn't that
rigorous comparing to rfc4627.
and for solidus '/' which is U+002F so it could be ignored , I guess, json-lib
also ignores it,
I am going to make sure if other client like iphone app and php front end could
deserialize the json encoded by gson when ignoring '/' (I think php-json could
deserialize it and I haven't yet got complainant from app team about this. :) )
quoted from http://www.ietf.org/rfc/rfc4627.txt
2.5. Strings
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus,
followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A
though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".
Alternatively, there are two-character sequence escape
representations of some popular characters. So, for example, a
string containing only a single reverse solidus character may be
represented more compactly as "\\".
To escape an extended character that is not in the Basic
Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented
as
"\uD834\uDD1E".
Crockford Informational [Page
4]
RFC 4627 JSON July
2006
string = quotation-mark *char quotation-mark
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
escape = %x5C ; \
quotation-mark = %x22 ; "
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
Original comment by diabl...@gmail.com
on 18 Aug 2011 at 2:32
Original issue reported on code.google.com by
jessewil...@google.com
on 17 Aug 2011 at 7:40