pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.98k stars 1.93k forks source link

Cannot use very large/small Datetime values #13404

Open stinodego opened 9 months ago

stinodego commented 9 months ago

Checks

Reproducible example

import polars as pl

s = pl.Series([-9_223_372_036_854_775_808]).cast(pl.Datetime("us"))
print(s)
# pyo3_runtime.PanicException: invalid or out-of-range datetime

Investigating further, it's not just printing that fails, but further computation with the column also fails:

result = s.dt.month()
# pyo3_runtime.PanicException: invalid or out-of-range datetime

The limit seems to be somewhere around 8_334_000_000_000_000_000 for microsecond datetimes:

s = pl.Series("a", [-8_334_000_000_000_000_000]).cast(pl.Datetime("us"))
print(s)  # works, shows -262124-01-20 16:00:00
s = pl.Series("a", [-8_335_000_000_000_000_000]).cast(pl.Datetime("us"))
print(s)  # error

For nanosecond, it seems to underflow:

s = pl.Series("a", [-9_223_372_036_854_775_808]).cast(pl.Datetime("ns"))
print(s)  # works, shows 1677-09-21 00:12:43.145224192

Log output

For the print:

thread '<unnamed>' panicked at /home/stijn/code/polars/crates/polars-arrow/src/temporal_conversions.rs:191:37:
invalid or out-of-range datetime
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/home/stijn/code/polars/py-polars/repro.py", line 42, in <module>
    print(result)
  File "/home/stijn/code/polars/py-polars/polars/series/series.py", line 558, in __str__
    s_repr: str = self._s.as_str()
                  ^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: invalid or out-of-range datetime

Issue description

Casting min/max i64 values to datetime works, but subsequent computation fails.

Expected behavior

The datetime value should be printed normally.

Installed versions

main branch

MarcoGorelli commented 9 months ago

I think this is just the usual chrono limitation of 262,000 +/- unix epoch

use chrono; // 0.4.31

fn main() {
    let res = chrono::NaiveDateTime::from_timestamp_opt(8200000000000, 0);
    println!("res: {:?}", res);  // res: Some(+261817-08-28T09:46:40)
    let res = chrono::NaiveDateTime::from_timestamp_opt(8300000000000, 0);
    println!("res: {:?}", res);  // res: None
}
stinodego commented 9 months ago

Right, that makes sense. What about the nanosecond underflow, though? Should it not also raise invalid or out-of-range datetime?

Also, I think it would be good to add those limits to the docstring of the Datetime class (and perhaps Date/Time/Duration if those have similar limitations).

MarcoGorelli commented 9 months ago

will check about the nanosec one

agree on documenting this limit