Closed allenbaron closed 1 year ago
I'm doing issue triage prior to a release that's under deadline.
But from a quick read, I think this may be a combination of R FAQ 7.31 (https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) with a light sprinkling of (insert something about POSIXct formatting here). I feel like there's something I can point to about POSIXct formatting and hopefully I'll come back to add it.
This is likely a #wontfix.
Coming back just to park a small example from @DavisVaughan about some of the joys of POSIXct. Again, I'm not really digging it to this just yet, but want to leave myself some notes.
> options(digits.secs = 20)
> Sys.time()
[1] "2023-03-23 15:47:08.864708 EDT"
> unclass(Sys.time())
[1] 1679600841
> options(digits = 20)
> unclass(Sys.time())
[1] 1679600848.7778339386
A little investigation of my own.
The date-time we used above is:
1660163842
converting that to google sheet form using the algorithm in as_CellData.POSIXct
gives us:
# (1660163842 / 86400) + 25569
44783.859282407407591
now, deep in the stack we eventually call httr::POST()
to send the data over the wire to google sheets. This ends up calling jsonlite::toJSON(body, auto_unbox = TRUE, digits = 22)
which gives us:
44783.8592824074
So we lost some precision. And indeed if you go to the google sheet that was created and convert the cell to Number format you'll see 44783.8592824074
in there.
If you reverse the algorithm with this you get:
# (44783.8592824074 - 25569) * 86400
1660163841.9999992847
# displays as "2022-08-10 20:37:21 UTC"
It is also worth noting that googlesheets4 doesn't "reverse the algorithm" in exactly this way, but i think it should. Instead inside googlesheets4:::as_datetime()
it ends up doing the multiplication first, then the subtraction, like:
# (44783.8592824074 * 86400) - (25569 * 86400)
1660163841.9999995232
Note that doing the math that way gave a slightly different answer than above, and I think it is slightly less correct, but this does reproduce the time difference of 4.768372e-07
reported in the original comment above.
And that jsonlite failure seems to come from an internal helper called num_to_char()
which is a C algorithm that seems to differ from format()
Browse[14]> jsonlite:::num_to_char(44783.859282407407591, digits = 22)
[1] "44783.8592824074"
Browse[14]> format(44783.859282407407591, digits = 22)
[1] "44783.85928240740759065"
It seems to limit digits
to 17 at some point
https://github.com/jeroen/jsonlite/blob/467923567ce2c2d535714cc3908a6912000a29ae/src/num_to_char.c#L65
@jeroen, any idea why that is?
Ha oh shoot if you just update to dev jsonlite then the jsonlite part of this is actually fixed:
> unclass(df$dt)/86400 + 25569
[1] 44783.85928240740759065
attr(,"tzone")
[1] "UTC"
> jsonlite::fromJSON(jsonlite::toJSON(unclass(df$dt)/86400 + 25569, digits = 22))
[1] 44783.85928240740759065
Fixed by https://github.com/jeroen/jsonlite/pull/413, which raised the cap from 15 to 17 digits (and 17 actually works)
It was argued for here https://github.com/jeroen/jsonlite/issues/350, since IEEE754 doubles converted to string with 17 significant digits must be able to roundtrip themselves
I did a little study of this and, at least with this example, the jsonlite fix is the only thing that touches it. Changing the order of operations in googlesheets4 when converting from a spreadsheet date time to POSIXct
didn't help (in fact the time difference got worse). So I think answer here is that users who need this level of precision right now should install dev jsonlite. And presumably a released version will appear in due course.
@jeroen Can you note this down as a reason in favor of a jsonlite release? TL;DR the 17 significant digits are necessary for roundtripping datetimes between R and Google Sheets.
OK I'll do that today.
Where exactly does jsonlite get invoked in this stack? We might have to backpaddle a little bit on the default in jsonlite because of https://github.com/jeroen/jsonlite/issues/420, but if you explicitly set digits = 17
then you should be fine.
I can't re-trace this right this moment, but I have no reason to doubt @DavisVaughan's research described above in https://github.com/tidyverse/googlesheets4/issues/273#issuecomment-1481888454.
now, deep in the stack we eventually call
httr::POST()
to send the data over the wire to google sheets. This ends up callingjsonlite::toJSON(body, auto_unbox = TRUE, digits = 22)
Also based on my memory, I think the jsonlite usage, on the "write" side, is likely to all be going through httr.
Writing datetimes to a Google Sheet with googlesheets4 and then re-reading it with the column formatted as a datetime alters the value. The change is very small and is not always noticeable with R's default print options.
I've included only a single datetime in the reprex because it shows a difference after printing.