sile / jsone

Erlang JSON library
MIT License
291 stars 72 forks source link

encoding bug? #29

Closed benoitc closed 6 years ago

benoitc commented 7 years ago

I've a test that automatically generate some values and I got the following error:

1> jsone:encode(#{<<"id">> => <<"aaaaaa">>,<<"v">> => <<128>>}).
** exception error: bad argument
     in function  jsone_encode:escape_string/4
        called as jsone_encode:escape_string(<<128>>,
                                             [{object_members,[]}],
                                             <<"{\"id\":\"aaaaaa\",\"v\":\"">>,
                                             {encode_opt_v2,false,false,
                                                            [{scientific,20}],
                                                            {iso8601,0},
                                                            string,0,0,false})

i'm not sure at this point if it's expected or not. Let me know :)

pichi commented 7 years ago

<<128>> doesn't seem like valid utf8 string. I'm not sure but JSON specification doesn't allow encoding of string which is not utf8. For example jiffy:

1> application:start(jiffy).
ok
2> jiffy:encode(#{<<"foo">>=><<128>>}).
** exception throw: {error,{invalid_string,<<128>>}}
     in function  jiffy:encode/2 (src/jiffy.erl, line 97)
benoitc commented 7 years ago

You're right imo. JSX is accepting it but reading the spec it probably shouldn't: https://tools.ietf.org/html/rfc7159#section-7

All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

sile commented 7 years ago

As already mentioned, the specification of JSON (RFC 7159) defines a JSON string is consisted from unicode characters and UTF-8 is recommended as the encoding format.

  1. Introduction A string is a sequence of zero or more Unicode characters [UNICODE].

8.1. Character Encoding

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).

So, I think the behaviour that reported by @benoitc is correct as a JSON implementation. And, for the simplicity, I would like jsone to support only UTF-8 strings.