pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.21k stars 1.95k forks source link

Can't multiply or divide a Duration Series by an integer #14094

Open Wainberg opened 9 months ago

Wainberg commented 9 months ago

Checks

Reproducible example

>>> pl.Series([datetime.timedelta(hours=1)]) * 24
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "polars/series/series.py", line 1130, in __mul__
    raise TypeError(msg)
TypeError: first cast to integer before multiplying datelike dtypes
>>> pl.Series([datetime.timedelta(hours=1)]) / 24
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "polars/series/series.py", line 1083, in __truediv__
    raise TypeError(msg)
TypeError: first cast to integer before dividing datelike dtypes
>>> pl.Series([datetime.timedelta(hours=1)]) // 24
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "polars/series/series.py", line 1104, in __floordiv__
    raise TypeError(msg)
TypeError: first cast to integer before dividing datelike dtypes

Log output

No response

Issue description

Can't multiply or divide a Duration Series by an integer. This was previously reported at https://github.com/pola-rs/polars/issues/9637, but as a feature enhancement rather than a bug.

For DataFrames, it's also wrong due to a different bug (https://github.com/pola-rs/polars/issues/12330):

>>> pl.DataFrame([datetime.timedelta(hours=1)]) * 24
shape: (1, 1)
┌─────────────┐
│ column_0    │
│ ---         │
│ i64         │
╞═════════════╡
│ 86400000000 │
└─────────────┘

Expected behavior

Should multiply to 24 hours and divide to 1/24 hours and 0 hours respectively.

Installed versions

``` --------Version info--------- Polars: 0.20.6 Index type: UInt32 Platform: Linux-4.4.0-22621-Microsoft-x86_64-with-glibc2.35 Python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:03:24) [GCC 12.3.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fsspec: gevent: hvplot: matplotlib: 3.8.2 numpy: 1.26.3 openpyxl: 3.1.2 pandas: 2.2.0 pyarrow: 14.0.2 pydantic: pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: 0.8.1 xlsxwriter: 3.1.9 ```
itamarst commented 9 months ago

Integer division of timedelta(hours=1) by 24 should not be 0 hours. Consider you can have multiple units in a delta, e.g. timedelta(hours=1, minutes=2). Pandas seems to treat float and integer division the same.