r-lib / clock

A Date-Time Library for R
https://clock.r-lib.org
Other
102 stars 5 forks source link

No warning on overflow and unexpected low supported time range. #363

Closed sorhawell closed 9 months ago

sorhawell commented 9 months ago

Hi clock people :)

I was taking a first look at the clock package to write conversions to/from r-polars. Looks good!

It appears "s" and "ns" somehow overflows without warning, whereas "ms" and "us" does not. I have not come to see the arithmetics-part of upper-lower time component, but I imagined you should be able to support year 0 to 140 million with "ns" precision with two R numerics with about 2^52 range right?

best Soren

char_times = c(
  "0001-01-01 01:01:01.000000001",
  "2212-01-01 12:34:57.123456789",
  "3712-01-01 12:34:56.123456789"
)
fmt = "%Y-%m-%d %H:%M:%OS"
clock_times = list(
  ns = clock::naive_time_parse(char_times , format = fmt, precision = "nanosecond"),
  us =  clock::naive_time_parse(char_times , format = fmt, precision = "microsecond"),
  ms =  clock::naive_time_parse(char_times , format = fmt, precision = "millisecond"),
  s = clock::naive_time_parse(char_times , format = fmt, precision = "nanosecond"),
  d = clock::naive_time_parse(char_times , format = fmt, precision = "day")
)

clock_times
#> $ns
#> <naive_time<nanosecond>[3]>
#> [1] "1754-08-30T23:44:42.128654849" "2212-01-01T12:34:57.123456789"
#> [3] "1958-05-04T13:51:14.994801941"
#> 
#> $us
#> <naive_time<microsecond>[3]>
#> [1] "0001-01-01T01:01:01.000000" "2212-01-01T12:34:57.123456"
#> [3] "3712-01-01T12:34:56.123456"
#> 
#> $ms
#> <naive_time<millisecond>[3]>
#> [1] "0001-01-01T01:01:01.000" "2212-01-01T12:34:57.123"
#> [3] "3712-01-01T12:34:56.123"
#> 
#> $s
#> <naive_time<nanosecond>[3]>
#> [1] "1754-08-30T23:44:42.128654849" "2212-01-01T12:34:57.123456789"
#> [3] "1958-05-04T13:51:14.994801941"
#> 
#> $d
#> <naive_time<day>[3]>
#> [1] "0001-01-01" "2212-01-02" "3712-01-02"
installed.packages()["clock","Version"]
#> [1] "0.7.0"
clock::tzdb_version()
#> [1] "2023c"
R.version
#>                _                           
#> platform       x86_64-apple-darwin20       
#> arch           x86_64                      
#> os             darwin20                    
#> system         x86_64, darwin20            
#> status                                     
#> major          4                           
#> minor          3.0                         
#> year           2023                        
#> month          04                          
#> day            21                          
#> svn rev        84292                       
#> language       R                           
#> version.string R version 4.3.0 (2023-04-21)
#> nickname       Already Tomorrow

Created on 2023-12-12 with reprex v2.0.2

DavisVaughan commented 9 months ago

It appears "s" and "ns" somehow overflows without warning

Note that for "s" you actually put "nanosecond" again, I imagine if you put "second" it would work

This is really a limitation of the underlying C++ library, date. It, like many other date time libraries, uses int64_t as the C++ representation of nanosecond time points (See std::chrono::nanoseconds under Helper Types at https://en.cppreference.com/w/cpp/chrono/duration).

This gives the nanosecond precision a range of around +/- 292 years around the epoch year of 1970.

On one side nanoseconds is represented by a int64_t which has a range of about +/- 292 years. (https://howardhinnant.github.io/date/date.html)

We definitely don't have any overflow protection yet, but that is quite hard to get right everywhere and will take a decent amount of effort to do it in a way that won't nuke performance too

DavisVaughan commented 9 months ago

https://github.com/r-lib/clock/pull/331 is also worth a read from a technical perspective

DavisVaughan commented 9 months ago

IIUC Rust Polars also uses an i64 for their date time types, with possible precision of nano, micro, milli https://docs.rs/polars/latest/polars/datatypes/enum.DataType.html#variant.Datetime, so it probably has the same restrictions

sorhawell commented 9 months ago

Note that for "s" you actually put "nanosecond" again, I imagine if you put "second" it would work

@DavisVaughan Oups sorry for drama :)

Then this issue can be closed. Can I ask on side note:

I got to the unpack / pack arithmetics

Is this correctly understood?:

The R clock package stores internally a naive_time as two R doubles which represent a u32 range The two u32 representa the upper and lower part of an u64 The u64 is offset into an i64

DavisVaughan commented 9 months ago

"two R doubles which represent a u32 range" i'd say they represent an i64 range, just in a weird way

The pack/unpack helpers there are definitely the key utilities to make this work

The rest of that looks right. In particular the arithmetic shift is important because it helps retain relative ordering (i.e. you can compare one lower/upper pair to another and get the same ordering result as if you compared the original i64s together, useful for our vctrs algorithms like vec_order() or vec_compare()) https://github.com/r-lib/clock/blob/adc01b61670b18463cc3087f1e58acf59ddc3915/src/duration.h#L146-L148