pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.16k stars 1.94k forks source link

map_elements doesn't retain the datetime's timezone #19268

Open yiteng-guo opened 2 weeks ago

yiteng-guo commented 2 weeks ago

Checks

Reproducible example

> dt = pd.Timestamp.now(tz="America/New_York").to_pydatetime()

> pl.select(x=dt).select(pl.col.x.map_elements(lambda x: x, pl.Datetime(time_zone="America/New_York")))

SchemaError: expected output type 'Datetime(Microseconds, Some("America/New_York"))', got 'Datetime(Microseconds, None)'; set `return_dtype` to the proper datatype

> pl.select(x=dt).select(pl.col.x.map_elements(lambda x: x, pl.Datetime("ns", time_zone="America/New_York")))

SchemaError: expected output type 'Datetime(Nanoseconds, Some("America/New_York"))', got 'Datetime(Microseconds, None)'; set `return_dtype` to the proper datatype

> pl.select(x=pl.lit(dt, dtype=pl.Datetime("ns", time_zone="America/New_York")))

> pl.select(x=pl.lit(pd.Timestamp.now(), dtype=pl.Datetime("ns", time_zone="America/New_York"))

Log output

No response

Issue description

I think there're two bugs/issues in this example

Expected behavior

All examples above should return a dataframe with the specified dtype w/o any error.

Installed versions

``` In [17]: pl.show_versions() --------Version info--------- Polars: 1.9.0 Index type: UInt32 Python: 3.10.13 ```
cmdlineluser commented 2 weeks ago

@MarcoGorelli I think this may fall under the A-timeseries label when you get a chance.