miloyip / nativejson-benchmark

C/C++ JSON parser/generator benchmark
MIT License
1.95k stars 262 forks source link

Tests `fail07`, `fail08` and `fail10` are invalid #98

Closed boazsegev closed 5 years ago

boazsegev commented 6 years ago

Testing for fail07 and fail08 seems invalid.

The extra comma / closure are after the JSON object and a conforming parser is allowed to stop between objects, indicating a complete JSON object was consumed.

This is especially true for streaming parsers that allow incomplete data (as well as overflowing data) to be fed to the parser.

The same is valid for fail10

nlohmann commented 6 years ago

This depends on the question you want to be answered by a test.

RFC 7159 defines a JSON text as

ws value ws

That is, after a value (e.g. ["Comma after the close"]) there may only be whitespace. Of course, you can assume that it is OK if a file contains additional characters, but this is just an assumption. And since the test files in question stem from json.org, it is fair to require JSON libraries to conform to these exact files.

boazsegev commented 6 years ago

@nlohmann ,

Thank you for the reply and for arguing the point from the other perspective.

Please consider my counter arguments.

As to Whitespace:

I don't know about the source for the test files, but the ws in the RFC should be read as 0 or more (notice the * beginning the list and the keywords "Insignificant" and "allowed", rather than "required" (I added emphasis):

Insignificant whitespace is allowed before or after any of the six structural characters

  ws = *(
         %x20 /              ; Space
         %x09 /              ; Horizontal tab
         %x0A /              ; Line feed or New line
         %x0D )              ; Carriage return

This signifies that a parser isn't supposed to require a whitespace after the JSON value.

See also the pass02 test which contains no whitespaces and is considered to be valid JSON.

Also, please consider that these tests add an extra requirement - they require the parser to consume data after a JSON object was completely parsed.

This requirement is biased against a class of parsers - all the parsers that allow streams to be parsed - since the rest of the data (including the expected error) will only be parsed on the next call to the parser.

Again, I'm not saying that the JSON in these files is valid - only that parsers shouldn't be expected to detect the JSON's invalidity the data the first time the parser "consumes" the data (which is the way parser conformance is currently tested) .

nlohmann commented 6 years ago

I did not assume that a JSON text must be surrounded by whitespace. And I also do not assume that there may exist situations in which it may be OK to accept a prefix of an input file. But at the same time, you may required strict JSON parsers that assume that all characters in a file form a JSON text.

If a parser makes another assumption, then I'm fine with this. But if it is unable to detect that an input file does not consist of a valid JSON text, then this is not the test's fault.

boazsegev commented 6 years ago

I think it comes down to this question: are parsers required to consume all the data they are fed?

I believe the answer is "no".

I think a common enough design for parsers is to stop at item completion (much like the C function strtod).

I posted an issue (rather than a PR) because I felt this question isn't mine to answer.

I hope you consider my point of view. Whatever you decide, I'm very happy for this tool and I'm very thankful.

miloyip commented 6 years ago

In another perspective, consider RFC 7159 has also relaxed that the root of JSON can be any type of values (not limited to object and array). So that, if a parser does not parse the whole JSON text, the following JSON will be considered valid:

123a

It will be parsed as a number 123 (according to the BNF), and then it can terminate the parsing process and said the JSON is valid. This may not be what user expects.

For fulfilling RFC 7159, I agree with @nlohmann that only whitespaces are allowed, after parsing the root value, before the end of JSON text. However, some parsers may provide a "relaxed" mode to stop parsing after the JSON root is parsed, such as kParseStopWhenDoneFlag in RapidJSON as requested by some users.

boazsegev commented 6 years ago

Hi @miloyip ,

Thank you for joining the discussion and your perspective.

I want to point out that it's clear to me that fail07, fail08 and fail10 represent invalid JSON.

However, I believe these tests don't represent parser failures unless there is a requirement for parsers to consume all available data.

IMHO, the user expectations about data consumption are attached to the parser's API (it's design) and they aren't absolute.

If I am correct, than the fail07, fail08 and fail10 tests are biased towards a specific API design.

The example 123a is a good example when it comes to expectations. I also think it's similar to 123", 123{ etc'.

I think it's reasonable to expect the parser to raise an error (for example, "illegal token" or "incomplete data")... but the question is "when" should a user expect the error to be raised.

If the API is similar to strtod (which would be normal for streaming parsers), my expectation would be that data would be parsed until completion of the root value. I would expect the error to be raised on the second call to the parser.

I think this expectation is valid, because the parser also lets me know that not all the data was consumed, so I can test if the reminder of the data is valid JSON (another object) or if it's junk (something was wrong).

On the other hand, if the parser's API doesn't inform me about the amount of data consumed (a non-streaming parser), I might expect the parser to consume all the data and report an error on my first call to the parser.

Please note, this is just my opinion. I am very thankful for your work and I don't want to argue needlessly. Please feel free to close the issue if you don't want to consider this change.