pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
26.63k stars 1.63k forks source link

`dt.total_nanoseconds` and `dt.total_microseconds` may overflow silently #16057

Open stinodego opened 1 week ago

stinodego commented 1 week ago

Checks

Reproducible example

from datetime import timedelta
import polars as pl

s = pl.Series([timedelta(days=106752)])
print(s.dt.total_nanoseconds())

Log output

shape: (1,)
Series: '' [i64]
[
        -9223371273709551616
]

Issue description

The Series has time unit us. Getting the total number of nanoseconds requires multiplying the underlying integer by 1000, which may overflow.

We currently use unchecked multiplication for this, while we would need to use checked multiplication and set any overflowing values to null. The fix would be to implement checked arithmetic kernels and use these here.

The offending code is here: https://github.com/pola-rs/polars/blob/eb7f9391cb7f2fb6984b6a72581168c5425abcfe/crates/polars-time/src/chunkedarray/duration.rs#L100-L108

Expected behavior

Output should be null rather than wrapping.

Installed versions

main