pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
26.9k stars 1.65k forks source link

Support parsing timezone-aware datetimes in constructors when data type is also timezone-aware #16297

Open stinodego opened 2 weeks ago

stinodego commented 2 weeks ago

Description

I ran into this today, and I think we can improve behavior here.

Consider this code:

from datetime import datetime
from zoneinfo import ZoneInfo
import polars as pl

data = [datetime(1970, 1, 1, tzinfo=ZoneInfo("Iran"))]
s = pl.Series(data, dtype=pl.Datetime(time_zone="Iran"), strict=False)
ValueError: time-zone-aware datetimes are converted to UTC

Please either drop the time zone from the dtype, or set it to 'UTC'. To convert to a different time zone, please use `.dt.convert_time_zone`.

This is odd. If we're casting timezone-aware data anyway, might as well cast it to desired time zone, right?

One of the benefits of doing this is that timezone-aware data can then be roundtripped, like in one of our tests:

@given(s=series())
def test_to_list(s: pl.Series) -> None:
    values = s.to_list()
    result = pl.Series(values, dtype=s.dtype)
    assert_series_equal(s, result, categorical_as_str=True)

For reference, PyArrow seems to handle this a bit differently from us and they do support this:

import pyarrow as pa

data = [datetime(1970, 1, 1, tzinfo=ZoneInfo("Iran"))]
arr = pa.array(data, type=pa.timestamp("us", tz="Iran"))

print(arr)  # 1969-12-31 20:30:00.000000Z
print(arr.type)  # timestamp[us, tz=Iran]

I may be missing something here, but I thought I'd throw this out there. Let's see what @MarcoGorelli thinks 😄

MarcoGorelli commented 2 weeks ago

thanks for the ping

so, for .str.to_datetime, the current rules are:

The constructor should probably not be too different. Let's see:

For the last one, I think you're suggesting to convert to the given time zone. So long as it's clearly documented, and it's done for both the Series constructor and .str.to_datetime, I think it makes sense