pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.19k stars 1.84k forks source link

Panic on datetime column min() #17713

Open Jex-y opened 1 month ago

Jex-y commented 1 month ago

Checks

Reproducible example

from datetime import datetime

import polars as pl

df = pl.DataFrame().with_columns(
    pl.datetime_range(datetime(2024, 7, 18), datetime(2024, 7, 19), '15m').alias(
        'datetime'
    )
)

print(df['datetime'].min())

df = df.with_columns(pl.col('datetime').dt.replace_time_zone('UTC'))

print(df['datetime'].min())

>>> 2024-07-18 00:00:00
>>> thread '<unnamed>' panicked at py-polars\src\conversion\any_value.rs:84:22:
>>> called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ValueError'>, value: ValueError("unexpected time zone offset: 'Europe/London'"), traceback: Some(<traceback object at 0x00000144BF81DB80>) }
>>> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
>>> Traceback (most recent call last):
>>>   File "D:\code\river-level-analysis\hydro-api\tests\polars_bug.py", line 15, in <module>
>>>     print(df['datetime'].min())
>>>           ^^^^^^^^^^^^^^^^^^^^
>>>   File "D:\code\river-level-analysis\hydro-api\.venv\Lib\site-packages\polars\series\series.py", line 1980, in min
>>>     return self._s.min()
>>>            ^^^^^^^^^^^^^
>>> pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ValueError'>, value: ValueError("unexpected time zone offset: 'Europe/London'"), traceback: Some(<traceback object at 0x00000144BF81DB80>) }

Log output

thread '<unnamed>' panicked at py-polars\src\conversion\any_value.rs:84:22:
called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ValueError'>, value: ValueError("unexpected time zone offset: 'UTC'"), traceback: Some(<traceback object at 0x000002249D6ADBC0>) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "D:\code\river-level-analysis\hydro-api\tests\polars_bug.py", line 15, in <module>
    print(df['datetime'].min())
          ^^^^^^^^^^^^^^^^^^^^
  File "D:\code\river-level-analysis\hydro-api\.venv\Lib\site-packages\polars\series\series.py", line 1980, in min
    return self._s.min()
           ^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'ValueError'>, value: ValueError("unexpected time zone offset: 'UTC'"), traceback: Some(<traceback object at 0x000002249D6ADBC0>) }

Issue description

This appears to happen with other timezones as well, not just UTC. E.g. Europe/London also produced the same error.

Expected behavior

Should not panic and should return a timezone aware datetime like: 2024-07-18 00:00:00+00:00

Installed versions

``` >>> pl.show_versions() --------Version info--------- Polars: 1.2.1 Index type: UInt32 Platform: Windows-10-10.0.19045-SP0 Python: 3.11.9 | packaged by Anaconda, Inc. | (main, Apr 19 2024, 16:40:41) [MSC v.1916 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: nest_asyncio: numpy: openpyxl: pandas: pyarrow: pydantic: pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: ```
Jex-y commented 1 month ago

I forgot to install with timezones. It now works with this installed, however an error message rather than a panic would be an improvement.