jeroen / jsonlite

A Robust, High Performance JSON Parser and Generator for R
http://arxiv.org/abs/1403.2805
Other
376 stars 40 forks source link

Inconsistent treatment of digits across 1e-05 #184

Open ivirshup opened 7 years ago

ivirshup commented 7 years ago

Noticed while debugging a Glimma plot. Basically, values greater than 1e-05 are rounded differently than those smaller than 1e-05 by toJSON. This causes issues like having values around 1e-05 round to 0 with the default – or lower – digits setting, while smaller numbers do not.

library(jsonlite)
values = exp(log(10)*seq(log10(1e-9),log10(1),length.out=1000))
summary(values[fromJSON(toJSON(values)) == 0])
Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
1.021e-05 1.514e-05 2.246e-05 2.492e-05 3.331e-05 4.940e-05 

The issue seems to be that toJSON does not consider an exponential representation of a numbers larger than 10^-5.

toJSON(values[1:10*100])
# [7.7964e-09,6.2057e-08,4.9396e-07,3.9318e-06,0,0.0002,0.002,0.0158,0.1256,1] 
fromJSON(toJSON(values[1:10*100]))
# 7.7964e-09 6.2057e-08 4.9396e-07 3.9318e-06 0.0000e+00 2.0000e-04 2.0000e-03 1.5800e-02 1.2560e-01 1.0000e+00

toJSON(values[1:10*100], digits=3)
# [7.796e-09,6.206e-08,4.94e-07,3.932e-06,0,0,0.002,0.016,0.126,1] 

This issue can lead to unintuitive results in downstream packages. A possible fix is using a boolean flag for exponential representation of values.

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin16.5.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] jsonlite_1.4

loaded via a namespace (and not attached):
[1] compiler_3.4.0
jeroen commented 7 years ago

A possible fix is using a boolean flag for exponential representation of values.

We have that with use_signif:

toJSON(x, use_signif = TRUE)

Also you can set digits to NA to print with maximum precision:

toJSON(x, digits = NA)
ivirshup commented 7 years ago

Is use_signif documented? I don't see it in the help for toJSON.

Additionally, should use_signif be true by default? I think it's a problem the default behavior is:

> fromJSON(toJSON(1.01e-5)) < fromJSON(toJSON(1.01e-6))
[1] TRUE
mikepqr commented 5 years ago

Just bumping this to say I ran into this unexpected behavior. In my case the code was something like:

> toJSON(data.frame(id = 1:2, val = c(3.5e-5, 3.5e-6))
[{"id":1,"val":0},{"id":2,"val":3.5e-06}]

the digits=NA fix works but the default behavior is either a bug or ... bizarre (and should therefore be documented).

TBlackmore commented 4 years ago

Another bump for this issue, default behaviour is very strange. digits = NA, fix works, but over an hour spent finding it.

Thanks for your work maintaining this package, its been very useful!

krlmlr commented 4 years ago

I agree that digits = NA should be the default, serialization shouldn't alter the precision. This affects {bench} too:

jsonlite::toJSON(data.frame(a = 12.3456789123456789 / (10^(-8:8))), digits = NA, pretty = TRUE)
#> [
#>   {
#>     "a": 1234567891.23457
#>   },
#>   {
#>     "a": 123456789.123457
#>   },
#>   {
#>     "a": 12345678.9123457
#>   },
#>   {
#>     "a": 1234567.89123457
#>   },
#>   {
#>     "a": 123456.789123457
#>   },
#>   {
#>     "a": 12345.6789123457
#>   },
#>   {
#>     "a": 1234.56789123457
#>   },
#>   {
#>     "a": 123.456789123457
#>   },
#>   {
#>     "a": 12.3456789123457
#>   },
#>   {
#>     "a": 1.23456789123457
#>   },
#>   {
#>     "a": 0.123456789123457
#>   },
#>   {
#>     "a": 0.0123456789123457
#>   },
#>   {
#>     "a": 0.00123456789123457
#>   },
#>   {
#>     "a": 0.000123456789123457
#>   },
#>   {
#>     "a": 1.23456789123457e-05
#>   },
#>   {
#>     "a": 1.23456789123457e-06
#>   },
#>   {
#>     "a": 1.23456789123457e-07
#>   }
#> ]

Created on 2020-07-05 by the reprex package (v0.3.0)

For reference, write.csv() doesn't round either.

data <- data.frame(a = 12.3456789123456789 / (10^(-8:8)))
write.csv(data, "out.csv", row.names = FALSE)
cat(readLines("out.csv"), sep = "\n")
#> "a"
#> 1234567891.23457
#> 123456789.123457
#> 12345678.9123457
#> 1234567.89123457
#> 123456.789123457
#> 12345.6789123457
#> 1234.56789123457
#> 123.456789123457
#> 12.3456789123457
#> 1.23456789123457
#> 0.123456789123457
#> 0.0123456789123457
#> 0.00123456789123457
#> 0.000123456789123457
#> 1.23456789123457e-05
#> 1.23456789123457e-06
#> 1.23456789123457e-07

Created on 2020-07-05 by the reprex package (v0.3.0)