[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
from datetime import timedelta
import polars as pl
v = timedelta.max
s = pl.Series([v], dtype=pl.Duration("ms"))
print(s)
Log output
thread '<unnamed>' panicked at py-polars/src/conversion/any_value.rs:211:41:
called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'OverflowError'>, value: OverflowError('Python int too large to convert to C long'), traceback: None }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/home/stijn/code/polars/py-polars/repro.py", line 7, in <module>
s = pl.Series([v], dtype=pl.Duration("ms"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stijn/code/polars/py-polars/polars/series/series.py", line 315, in __init__
self._s = sequence_to_pyseries(
^^^^^^^^^^^^^^^^^^^^^
File "/home/stijn/code/polars/py-polars/polars/_utils/construction/series.py", line 194, in sequence_to_pyseries
py_series = PySeries.new_from_any_values(name, values, strict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'OverflowError'>, value: OverflowError('Python int too large to convert to C long'), traceback: None }
Issue description
The issue is that the AnyValue conversion py_object_to_any_value does not have the dtype information. It tries to parse this as a microseconds value first, which will overflow.
The fix is to create a new conversion util py_object_and_dtype_to_any_value, which takes a data type in addition to the object. Then we can parse the value with the correct time unit. It would also allow skipping type inference so there would be a minor performance benefit.
To show that this should work, the following works fine:
from datetime import timedelta
import polars as pl
from polars._utils.convert import timedelta_to_int
v = timedelta.max
v_int = timedelta_to_int(v, "ms")
s = pl.Series([v_int]).cast(pl.Duration("ms"))
print(s)
Checks
Reproducible example
Log output
Issue description
The issue is that the AnyValue conversion
py_object_to_any_value
does not have the dtype information. It tries to parse this as a microseconds value first, which will overflow.The fix is to create a new conversion util
py_object_and_dtype_to_any_value
, which takes a data type in addition to the object. Then we can parse the value with the correct time unit. It would also allow skipping type inference so there would be a minor performance benefit.To show that this should work, the following works fine:
Expected behavior
Should work.
Installed versions
main