Open ly9chee opened 12 months ago
https://datatracker.ietf.org/doc/html/rfc8259#section-6
This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754 binary64 (double precision) numbers [IEEE754] is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.
Note that when such software is used, numbers that are integers and are in the range
[-(2**53)+1, (2**53)-1]
are interoperable in the sense that implementations will agree exactly on their numeric values.
Also note that proto3 maps 64-bit integer to json string rather than json number: https://protobuf.dev/programming-guides/proto3/#:~:text=int64%2C%20fixed64%2C%20uint64,string
In the long term we do not want to be the bottleneck of limiting json numbers to only 2^53. But for good interoperability it is better not to assume all tools in the data pipeline can support it.
or this field should placed by NULL instead of a parse error?
Currently a parse error during ingestion
(source) causes the whole row to be skipped, while an expr error (incl casting from string to some type) during computation
(stream) fills a null value to that field. This may feel inconsistent, and thanks for bringing it up.
As the user suggests, simd-json 0.13.5 now enables us to convert very-large-numbers into f64, with a potential loss of precisions.
big-int-as-float The big-int-as-float feature flag treats very large integers that won't fit into u64 as f64 floats. This prevents parsing errors if the JSON you are parsing contains very large numbers. Keep in mind that f64 loses some precision when representing very large numbers.
But I still think it's not a good way to deal with such numbers. Converting them to decimal https://crates.io/crates/bigdecimal may be better.
Describe the bug
Hi there, recently I found some parse error logs in cn node like below
the JSON data mentioned in log at partition 0 offset 8 in kafka topic is skipped by kafka source and result in data loss eventually.
after some invistigation of the source code I realized that this is simd-json parse error caused by a field has a large integer number value, so I wrote a mimimum rust snippet to reproduce this error:
output:
In our kafka upstream, JSON data may contains large integer value inside nested object, it is very hard to detect those values and correct them before ingestion.
so, is this an expected behavior? or this field should placed by NULL instead of a parse error?
Error message/log
No response
To Reproduce
No response
Expected behavior
No response
How did you deploy RisingWave?
Kubernetes
The version of RisingWave
PostgreSQL 9.5-RisingWave-1.3.0 (c4c31bdc5e8763ae65ec23293e8c07bdfd4ab4df)
Additional context
No response