jeroen / jsonlite

A Robust, High Performance JSON Parser and Generator for R
http://arxiv.org/abs/1403.2805
Other
379 stars 40 forks source link

Hardcoded 15 significant digits in num_to_char.c #412

Open cat-zeppelin opened 1 year ago

cat-zeppelin commented 1 year ago

Accordingly to Standard for Floating-Point Arithmetic (IEEE754) if a double-precision number converted to string with 17 significant digits and then converted back to double-precision representation, the final result must match the original number.

But unfortunatelly in numerous places in num_to_char.c (e.g. row 30 int sig_digits = signif ? ceil(fmin(15, precision)) : 0;) ) the number of significant digits limited to 15, so that it is impossible to serialize a double precision number to json and deserialize it to the original value.

The current implementation returns FALSE for the simple check

json <- jsonlite::toJSON(pi, digits = I(15))
pi2 <- jsonlite::fromJSON(json)
identical(pi, pi2)
# > [1] FALSE

I've rebuilt the package locally with fixed 15 to 17 in num_to_char.c, and now it works as it must work. This code returns TRUE

json <- jsonlite::toJSON(pi, digits = I(17))
pi2 <- jsonlite::fromJSON(json)
identical(pi, pi2)
# > [1] TRUE
jeroen commented 1 year ago

Can you show us the json string for both examples?

cat-zeppelin commented 1 year ago

jsonlite::toJSON(pi, digits = I(17)) [3.1415926535897931] jsonlite::toJSON(pi, digits = I(15)) [3.14159265358979]

jeroen commented 1 year ago

OK and now look up the true value of π with 17 digits.

cat-zeppelin commented 1 year ago

np, here it is json <- jsonlite::toJSON(pi, digits = I(17)) pi2 <- jsonlite::fromJSON(json) identical(pi, pi2)

> [1] TRUE

cat-zeppelin commented 1 year ago

identical(as.numeric("3.1415926535897932"), as.numeric("3.1415926535897931")) [1] TRUE

jeroen commented 1 year ago

Here is a hint ;) https://www.quora.com/Whats-the-17th-digit-of-pi

It may roundtrip, but the digit is not significant in the statistical sense. It is actually wrong in this case, so that is why we don't print it.

Anyway I guess we can change it, given that it only appears if users explicitly use I(17)

cat-zeppelin commented 1 year ago

it is not said by standard that different decimal representation won't be the same in binary, rather it is said that 17 digits is minimum necessary number of significant digits to represent a floating point number with 53 bits mantisa in decimal representation

cat-zeppelin commented 1 year ago

IEEE 754 says:

image
jeroen commented 1 year ago

it is not said by standard that different decimal representation won't be the same in binary, rather it is said that 17 digits is minimum necessary number of significant digits to represent a floating point number with 53 bits mantisa in decimal representation

Yes I understand it very well :) The question is: do we want a json string that is an accurate representation of the real number (pi in this case) or of the float value approximating that number (including the inaccurate noise part)? As a statistician I am inclined to the former, and and consider the 17th digit insignificant, because it is not accurate.

cat-zeppelin commented 1 year ago

but the same may happen with 0.1 simpy. there is no exact representation in binary form of this simple decimal number. we can not cover all cases of transforming binary to decimal exactelly. we just need to follow standarts that allow 1-to-1 forward and back transforming from binary to decimal and back

jeroen commented 1 year ago

we just need to follow standarts that allow 1-to-1 forward and back transforming from binary to decimal and back

Then the problem appears the other way around, with 17 digits we can no longer convert the decimal to binary and back.

x <- "[3.1415926535897931]"
toJSON(fromJSON(x), digits = I(17))

Anyway it's fine by me, can you send a PR with your change?

cat-zeppelin commented 1 year ago

you use perhaps your original package that takes min between 15 and precision. if you rebuild it then all good

x <- "[3.1415926535897931]" jsonlite::toJSON(jsonlite::fromJSON(x), digits = I(17)) [3.1415926535897931]

another exmple of why we should not believe decimal represenation of binary number is below: (quite obvious)

x <- 0.099999999999999999 jsonlite::toJSON(x) [0.1]

as for PR - I'll do

Kodiologist commented 1 year ago

@jeroen I'm guessing this issue should be closed now that the PR is in.

Relatedly, the help string for serializeJSON says "JSON is a text based format which leads to loss of precision when printing numbers", which isn't quite true. Rather, rounding causes loss of precision. I'd recommend setting the default digits value to NA for this and other functions and removing the warning. This will protect against loss of precision not requested by the user.

jeroen commented 1 year ago

@cat-zeppelin your change has caused some breakage. Can you please fix https://github.com/jeroen/jsonlite/issues/420 ?