tidyverse / readr

Read flat files (csv, tsv, fwf) into R
https://readr.tidyverse.org
Other
1.01k stars 286 forks source link

Strange rounding errors with signif and write_csv #1502

Open MarekGierlinski opened 1 year ago

MarekGierlinski commented 1 year ago

I noticed that certain numbers cause write_csv, write_tsv, etc. to produce strange rounding errors. Here is an example:

library(readr)

df <- data.frame(x = 1.406148e-20)
df$rounded = signif(df$x, 4)

write_csv(df, "test1.csv")
write.csv(df, "test2.csv", row.names = FALSE)

The content of test1.csv file is

x,rounded
1.406148e-20,1.4060000000000002e-20

The problem seems to be only with readr. The test2.csv file is created as expected:

"x","rounded"
1.406148e-20,1.406e-20

I wonder if this is reproducible on other systems.

``` > sessionInfo() R version 4.3.1 (2023-06-16) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.4.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: Europe/London tzcode source: internal attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] readr_2.1.4 ```
MarekGierlinski commented 1 year ago

I noticed that this issue might be related to #1495 and #1441.

hadley commented 1 year ago

Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session.

MarekGierlinski commented 1 year ago
library(readr)

df <- data.frame(x = 1.406148e-20)
df$rounded = signif(df$x, 4)
df
#>              x   rounded
#> 1 1.406148e-20 1.406e-20

write_csv(df, stdout())
#> x,rounded
#> 1.406148e-20,1.4060000000000002e-20

write.csv(df, stdout(), row.names = FALSE)
#> "x","rounded"
#> 1.406148e-20,1.406e-20
hadley commented 1 year ago

Thanks! Here's a somewhat more minimal reprex:

library(readr)

df <- data.frame(x = 1.406148e-20)
df$rounded <- signif(df$x, 4)
cat(format_csv(df))
#> x,rounded
#> 1.4061479999999998e-20,1.4060000000000002e-20

Created on 2023-08-02 with reprex v2.0.2

FWIW these numbers aren't invented, they're just normally not printed:

sprintf("%.20e", df$rounded)
#> [1] "1.40600000000000021268e-20"
MarekGierlinski commented 1 year ago

Oh, I see. We are hitting the limits of binary representation of a number? Nothing to do with readr or signif?

sprintf("%.50e", 1.2)
#> [1] "1.19999999999999995559107901499373838305473327636719e+00"

Still, would be nice to have numbers truncated nicely, just as in the default write.csv.

hadley commented 1 year ago

Oh yeah, it's definitely a bug, it will just require some thought to fix because we need to apply some (probably well known) heuristic to avoid accidentally removing decimal places that are important.