ibireme / yyjson

The fastest JSON library in C
https://ibireme.github.io/yyjson/doc/doxygen/html/
MIT License
3.04k stars 262 forks source link

Read JSON with big number `"large:" 1e999` shuold be allowed by default #124

Closed Zzzode closed 1 year ago

Zzzode commented 1 year ago

Describe the bug Here is a simple reproduce:

const char* buf = "{\"large\":1e999}";
size_t len = strlen(buf);
yyjson_doc *doc = yyjson_read(buf, len, 0);

In this example, the doc variable ends up being NULL. However, according to the ECMA 404 specification,

Numeric values that cannot be represented as sequences of digits (such as Infinity and NaN) are not permitted.

There is no provision that explicitly states that the normal numeric representation of 1e999 is unacceptable, only that the representation of Inf and NaN are illegal.

Furthermore, while RFC 8259 allows implementations to impose limits on the range and precision of numbers accepted, the most popular JavaScript engine implementation accepts large numbers like 123e999.

I notice that there is an option YYJSON_READ_ALLOW_INF_AND_NAN to read large numbers, but it also supports reading illegal Inf NaN characters, which is not what we want. As such, it may be reasonable to accept big numbers like 1e999 when reading JSON by default.

Your environment

Additional context ECMA-404 RFC-8259

TkTech commented 1 year ago

You're looking for YYJSON_READ_NUMBER_AS_RAW. You can see a practical example of using it in py_yyjson:

ibireme commented 1 year ago

According to RFC 8259, implementations can set limits on the range and precision of numbers accepted. This means that either accepting or rejecting large numbers is in line with the standard. The most common numeric types for C are only 64-bit int64, uint64, and double, so most C/C++ JSON libraries limit the precision to this range. For JavaScript, V8 also limits the numbers to the double range:

> JSON.stringify(123e999) 
< "null"
> JSON.parse("123e999") 
< Infinity

If you want to handle the numbers yourself, you can use YYJSON_READ_NUMBER_AS_RAW as @TkTech suggested, which will parse all numbers into RAW type strings. If you only want to reject Inf NaN literals but accept large numbers, it is not supported now, but maybe we can add a new flag YYJSON_READ_BIGNUM_AS_RAW or YYJSON_READ_BIGNUM_AS_NULL.

Could you tell me more about your application scenario?

Zzzode commented 1 year ago

@ibireme

For JavaScript, V8 also limits the numbers to the double range

In the example provided, v8 parsed 123e999 to Infinity, which is a behavior defined by the ecma262 standard for JSON.parse. The standard stipulates that JavaScript reads a valid JSON string and then does the PARSE, which is similar to evaluating (123e999) in JS, converting the JSON Value into the JavaScript value. 123e999 is converted to Infinity at this time.

Therefore, I think this behavior of v8 parse 123e999 to Infinity is actually due to ecma262, not ECMA-404 or RFC-8259.

I am working on JSON optimization for a self-developed JavaScript virtual machine, expecting to use yyjson as the json parser and writer backend. But because of the issue I mentioned above, I can't just parse a '{"a": 123e999}' with yyjson and generate the data structure I need because yyjson_read will just return NULL or yyjson_read_err. I think translate 123e999 to Inf is a behavior that should be handled by my JS Engine instead of yyjson.

Since ecma262 determines whether json is valid based on the ECMA-404, which is somewhat different from RFC-8259, it does not reject a large number like 1e999 at parse, which is a problem I ran into.

I'll try YYJSON_READ_NUMBER_AS_RAW as a workaround, but I think the issue I mentioned is real, and maybe we could have more discussion about it. Add a new flag YYJSON_READ_BIGNUM_AS_RAW would also be a good idea, maybe we can try to implement it.

Zzzode commented 1 year ago

@TkTech

Thanks, I'll try that

ibireme commented 1 year ago

YYJSON_READ_BIGNUM_AS_RAW flag has been added: https://github.com/ibireme/yyjson/commit/f2525dcb2390243555b5fa4cafcf689e618d990e

Zzzode commented 1 year ago

@ibireme Well done! I've tried this flag and it worked! Thx 🎉🎉🎉!