shau-lok / google-gson

Automatically exported from code.google.com/p/google-gson
0 stars 0 forks source link

Decide what to do about '/' #356

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
One of our users correctly pointed out that we're disagreeing with the standard 
in our treatment of the '/' character. 

https://groups.google.com/forum/#!topic/google-gson/c77R5HgDk3o

This isn't a regression; As far as I can tell we've always had this behavior. 
In fact, android.util.JsonReader was changed to do it this way for consistency 
with GSON on this change: eb97c0ddc063176c26065fc6855188edf0c16e03

Original issue reported on code.google.com by jessewil...@google.com on 17 Aug 2011 at 7:40

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Hi, I found a doc which talks about this deeper, in the first paragraph, it 
says  "  quotation mark, reverse solidus, and the control characters (U+0000 
through +001F)." is the character must be escaped (such as append a '\' in 
front of it). So I think the graph representation in www.json.org isn't that 
rigorous comparing to rfc4627.

and for solidus '/' which is U+002F so it could be ignored , I guess, json-lib 
also ignores it,

I am going to make sure if other client like iphone app and php front end could 
deserialize the json encoded by gson when ignoring '/' (I think php-json could 
deserialize it and I haven't yet got complainant from app team about this. :) )

quoted from http://www.ietf.org/rfc/rfc4627.txt

2.5.  Strings
 The representation of strings is similar to conventions used in the C
   family of programming languages.  A string begins and ends with
   quotation marks.  All Unicode characters may be placed within the
   quotation marks except for the characters that must be escaped:
   quotation mark, reverse solidus, and the control characters (U+0000
   through U+001F).

   Any character may be escaped.  If the character is in the Basic
   Multilingual Plane (U+0000 through U+FFFF), then it may be
   represented as a six-character sequence: a reverse solidus,
followed
   by the lowercase letter u, followed by four hexadecimal digits that
   encode the character's code point.  The hexadecimal letters A
though
   F can be upper or lowercase.  So, for example, a string containing
   only a single reverse solidus character may be represented as
   "\u005C".

   Alternatively, there are two-character sequence escape
   representations of some popular characters.  So, for example, a
   string containing only a single reverse solidus character may be
   represented more compactly as "\\".

   To escape an extended character that is not in the Basic
Multilingual
   Plane, the character is represented as a twelve-character sequence,
   encoding the UTF-16 surrogate pair.  So, for example, a string
   containing only the G clef character (U+1D11E) may be represented
as
   "\uD834\uDD1E".

Crockford                    Informational                      [Page
4]

RFC 4627                          JSON                         July
2006

         string = quotation-mark *char quotation-mark

         char = unescaped /
                escape (
                    %x22 /          ; "    quotation mark  U+0022
                    %x5C /          ; \    reverse solidus U+005C
                    %x2F /          ; /    solidus         U+002F
                    %x62 /          ; b    backspace       U+0008
                    %x66 /          ; f    form feed       U+000C
                    %x6E /          ; n    line feed       U+000A
                    %x72 /          ; r    carriage return U+000D
                    %x74 /          ; t    tab             U+0009
                    %x75 4HEXDIG )  ; uXXXX                U+XXXX

         escape = %x5C              ; \

         quotation-mark = %x22      ; "

         unescaped = %x20-21 / %x23-5B / %x5D-10FFFF 

Original comment by diabl...@gmail.com on 18 Aug 2011 at 2:32