r-lib / clock

A Date-Time Library for R
https://clock.r-lib.org
Other
97 stars 4 forks source link

The max/min values of time-points at different precisions are complex to figure out #280

Closed DavisVaughan closed 1 year ago

DavisVaughan commented 2 years ago
devtools::load_all()
#> ℹ Loading clock

# Largest possible minute precision duration on my Mac
# (would be smaller on Windows where the minute duration type is built on `int` not `int64_t`)
x <- new_duration_from_fields(list(ticks=2147483647L, ticks_of_day=1439L), PRECISION_MINUTE, NULL)
x
#> <duration<minute>[1]>
#> [1] 3092376453119
as_sys_time(x)
#> <time_point<sys><minute>[1]>
#> [1] "20599-06-22T23:59"

# Largest possible second precision duration (still room to go since this is an int64_t)
x <- new_duration_from_fields(list(ticks=2147483647L, ticks_of_day=86399L), PRECISION_SECOND, NULL)
x
#> <duration<second>[1]>
#> [1] 185542587187199
as_sys_time(x)
#> <time_point<sys><second>[1]>
#> [1] "20599-06-22T23:59:59"

# Largest possible millisecond precision duration (still room to go since this is an int64_t)
x <- new_duration_from_fields(list(ticks=2147483647L, ticks_of_day=86399L, ticks_of_second=999L), PRECISION_MILLISECOND, NULL)
x
#> <duration<millisecond>[1]>
#> [1] 185542587187199999
as_sys_time(x)
#> <time_point<sys><millisecond>[1]>
#> [1] "20599-06-22T23:59:59.999"

# Past the max int64_t value so it wrapped
x <- new_duration_from_fields(list(ticks=2147483647L, ticks_of_day=86399L, ticks_of_second=99999L), PRECISION_MICROSECOND, NULL)
x
#> <duration<microsecond>[1]>
#> [1] 1075146450103583839
as_sys_time(x)
#> <time_point<sys><microsecond>[1]>
#> [1] "-29496-01-13T15:41:43.583839"

# Real max microsecond value approaches max integer64 value (this is close to 9223372036854775807,
# but never actually reaches it because ticks_of_second wont get above 99999L)
x <- new_duration_from_fields(list(ticks=106751991L, ticks_of_day=14454L, ticks_of_second=99999L), PRECISION_MICROSECOND, NULL)
x
#> <duration<microsecond>[1]>
#> [1] 9223372036854099999
as_sys_time(x)
#> <time_point<sys><microsecond>[1]>
#> [1] "32103-01-10T04:00:54.099999"

This is a result of the way these are built with a breakdown of ticks/ticks-of-day/ticks-of-second. If all duration and time-point types used two doubles and stored a broken down int64_t value in there, then we could utilize the full range (which I think could be much larger!). This would be wasteful in terms of memory in many scenarios because often we only use ticks, like with day precision durations. This is only 4 bytes of memory (1 int) vs the 16 that would be required for 2 doubles. It would definitely simplify the codebase though, as the number of fields wouldn't vary anymore. We'd also have to consider how this works with proxies and comparisons (although I think I figured that out with integer64 in vctrs)

It gets even more complex because each platform decides what type can be used for each duration type: General guidelines are here https://en.cppreference.com/w/cpp/chrono/duration Windows implementation is here https://docs.microsoft.com/en-us/cpp/standard-library/chrono?view=msvc-170#typedefs (notice long long for second but int for minute) GCC implementation is here https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/libstdc%2B%2B-v3/include/std/chrono#L816-L845 (notice int64_t for everything!) Mac seems to use int64_t through hour, then switches to int for day

DavisVaughan commented 1 year ago

This has gotten better with #331. It isn't perfect but we now at least align with the limits of date itself