r-lib / clock

A Date-Time Library for R
https://clock.r-lib.org
Other
97 stars 4 forks source link

Document leap second handling (in FAQ article)? #309

Closed trevorld closed 1 year ago

trevorld commented 1 year ago

Perhaps it would make sense to document how {clock} handles leap seconds? Perhaps in the FAQ article?

I'm observing that on my computer {clock} parses leap seconds as NA values (and issues a Warning) and that differences between UTC times are in POSIX seconds (instead of SI/metric seconds)

Package Parsing leap seconds Difference between UTC times
{clock} as NA POSIX seconds
{nanotime} as next second POSIX seconds
base::as.POSIXct() as next second POSIX seconds
base::as.POSIXlt() as leap second POSIX seconds (probably after conversion to POSIXct())
# `base::.leap.seconds` contains (the second after) all UTC leap seconds (so far)
# Due to leap second at end of 2005 here are three different UTC seconds
leap_before <- "2005-12-31T23:59:59"
leap_second <- "2005-12-31T23:59:60"
leap_after <- "2006-01-01T00:00:00"

## (on my computer) {clock} won't parse, differences are POSIX seconds
library("clock")
year_month_day_parse(leap_second, precision = "second")
Warning: Failed to parse 1 string at location 1. Returning `NA` at that location.
<year_month_day<second>[1]>
[1] NA
naive_time_parse(leap_second)
Warning: Failed to parse 1 string at location 1. Returning `NA` at that location.
<clock_naive_time[1]>
[1] NA
sys_time_parse(leap_second)
Warning: Failed to parse 1 string at location 1. Returning `NA` at that location.
<clock_sys_time[1]>
[1] NA
# Difference of two metric seconds but (on some computers) one POSIX second
sys_time_parse(leap_after) - sys_time_parse(leap_before)
<duration<second>[1]>
[1] 1
## {nanotime} will parse as next second, differences are POSIX seconds
library("nanotime")
as.nanotime(paste0(leap_second, "Z"))
[1] 2006-01-01T00:00:00+00:00
as.nanotime(paste0(leap_after, "Z")) - as.nanotime(paste0(leap_before, "Z"))
[1] 00:00:01
## as.POSIXct() will parse as next second, differences are POSIX seconds
as.POSIXct(leap_second, format = "%FT%T", tz = "UTC") |> format(format = "%F %T")
[1] "2006-01-01 00:00:00"
as.POSIXct(leap_after, format = "%FT%T", tz = "UTC") - as.POSIXct(leap_before, format = "%FT%T", tz = "UTC")
Time difference of 1 secs
## as.POSIXlt() will correctly parse leap second...
## but differences are POSIX seconds (presumably converted to `POSIXct` before differencing)
as.POSIXlt(leap_second, format = "%FT%T", tz = "UTC") |> format(format = "%F %T")
[1] "2005-12-31 23:59:60"
as.POSIXlt(leap_after, format = "%FT%T", tz = "UTC") - as.POSIXlt(leap_before, format = "%FT%T", tz = "UTC")
Time difference of 1 secs
DavisVaughan commented 1 year ago

I'm not sure we should really use any of the current implementations as a good reference source. They all seem kind of hand wavy, and allow parsing of non leap seconds too. Basically it seems like they all have some simple special handling of "60s".

The main thing is that with POSIXct, leap seconds are completely ignored, full stop. See ?POSIXct:

"POSIXct" times used by R do not include leap seconds on any platform.

So really it is just a matter of what to do during parsing.

The "right" solution is to only allow 60s when parsing if you are actually on a leap second date. Then you need a way to store it, and you have to make a decision about what to do with it when converting to sys-time or naive-time (and from there, Date and POSIXct), which don't support leap seconds. <date> includes a utc_clock class that can handle this, and it actually parses correctly by checking against the actual leap seconds to see if it corresponds to a real leap second or not: https://github.com/HowardHinnant/date/blob/50acf3ffd8b09deeec6980be824f2ac54a50b095/include/date/tz.h#L2022-L2032

When going from utc_clock -> sys-time / naive-time, date maps leap seconds to the nearest possible moment in time before the leap second, which is reasonable.

I may include this in the future, but leap seconds are a little complicated because they are included in the text form of the time zone database (that clock uses now) but not in the binary form of the time zone database on Mac (which we may switch to in the future for performance). So I'd have to come up with a way to deal with that.

For now I will add some docs about this in FAQ as you say

# Note that 2006 here was not a leap second year, but parsing "sort of works" anyways

format <- "%Y-%m-%d %H:%M:%S"

# POSIXlt allows 60s, but that rolls over when converting to POSIXct
lubridate::fast_strptime("2006-12-31 23:59:60", format)
#> [1] "2006-12-31 23:59:60 UTC"
lubridate::fast_strptime("2006-12-31 23:59:60", format, lt = FALSE)
#> [1] "2007-01-01 UTC"

# Can't represent 61s in POSIXlt, so lubridate rolls over even in the POSIXlt
lubridate::fast_strptime("2006-12-31 23:59:61", format)
#> [1] "2007-01-01 00:00:01 UTC"
lubridate::fast_strptime("2006-12-31 23:59:61", format, lt = FALSE)
#> [1] "2007-01-01 00:00:01 UTC"

# But it thinks this is garbage?
lubridate::fast_strptime("2006-12-31 23:59:62", format)
#> [1] NA
lubridate::fast_strptime("2006-12-31 23:59:62", format, lt = FALSE)
#> [1] NA

# POSIXlt allows 60s, rolls over when converting to POSIXct
strptime("2006-12-31 23:59:60", format, tz = "UTC")
#> [1] "2006-12-31 23:59:60 UTC"
as.POSIXct(strptime("2006-12-31 23:59:60", format, tz = "UTC"))
#> [1] "2007-01-01 UTC"

# POSIXlt can't handle 61s, so base R says this is NA
strptime("2006-12-31 23:59:61", format, tz = "UTC")
#> [1] NA
as.POSIXct(strptime("2006-12-31 23:59:61", format, tz = "UTC"))
#> [1] NA

# Rolls over for 60s, errors on 61s
nanotime::as.nanotime("2006-12-31T23:59:60Z")
#> [1] 2007-01-01T00:00:00+00:00
try(nanotime::as.nanotime("2006-12-31T23:59:61Z"))
#> Error in RcppCCTZ::parseDouble(x, fmt = format, tzstr = tz) : 
#>   Parse error on 2006-12-31T23:59:61Z

Created on 2023-04-21 with reprex v2.0.2.9000