lloyd / yajl

A fast streaming JSON parsing library in C.
http://lloyd.github.com/yajl
ISC License
2.15k stars 435 forks source link

incorrect set of white space characters #180

Open eriksjolund opened 8 years ago

eriksjolund commented 8 years ago

The JSON format specification https://tools.ietf.org/html/rfc7159 talks about these white space characters:

  ws = *(
          %x20 /              ; Space
          %x09 /              ; Horizontal tab
          %x0A /              ; Line feed or New line
          %x0D )              ; Carriage return

It seems to me that yajl also allows for \v (vertical tab) and \f (form feed).

$ grep '\\v' yajl/src/yajl_lex.c 
            case '\t': case '\n': case '\v': case '\f': case '\r': case ' ':
$
$ od -c /tmp/aa.txt 
0000000  \v  \f       [      \v  \f           1       ]  \n
0000015
$ cat /tmp/aa.txt|json_verify 
JSON is valid
$ 

It looks like a bug to me.

eriksjolund commented 8 years ago

Looking in https://tools.ietf.org/html/rfc7159 I was surprised to see: "A JSON parser MAY accept non-JSON forms or extensions"

Anyway, I think it it still better to fail on non-JSON input than to silently accept the input.

drjasonharrison commented 8 years ago

I'd prefer that the extended set of white space be a feature and be documented rather than breaking on Windows created/edited files. I have no love of the carriage return plus line feed standard but it annoyingly produces inputs to my programs.

An alternative is a strict mode.

-Jason 604.644.8611 On Dec 13, 2015 6:53 AM, "Erik Sjölund" notifications@github.com wrote:

Looking in https://tools.ietf.org/html/rfc7159 I was surprised to see: "A JSON parser MAY accept non-JSON forms or extensions"

Anyway, I think it it still better to fail on non-JSON input than to silently accept the input.

— Reply to this email directly or view it on GitHub https://github.com/lloyd/yajl/issues/180#issuecomment-164266433.

eriksjolund commented 8 years ago

Carriage return (\r) and line feed (\n) are allowed white space characters according to the RFC7159. I was referring to formfeed (\f) and vertical tab (\v). I don't know how common they are and for what they are normally used.

Yes, having two parsing modes, strict mode and extension mode, is an alternative.

lamont-granquist commented 8 years ago

strict mode would be fine, but ideally not by default. since json is commonly used in network APIs, "Be conservative in what you send, be liberal in what you accept" should be the default behavior.