Closed MarcoGorelli closed 1 year ago
It's a good question :)
This update inside def lit(...)
could work as suggested...
if isinstance(value, datetime):
tu = "us" if dtype is None else getattr(dtype, "tu", "us")
e = lit(_datetime_to_pl_timestamp(value, tu)).cast(Datetime(tu))
dtype_tz = dtype and getattr(dtype, "tz", None)
if value.tzinfo is not None or dtype_tz:
return e.dt.replace_time_zone(dtype_tz or str(value.tzinfo))
return e
...though we should survey other uses of _datetime_to_pl_timestamp
to ensure we're being consistent π€
With the update in place we'd get the following:
d = datetime( 2023,1,1, tzinfo=ZoneInfo("Asia/Tokyo") )
pl.DataFrame({
"d1": [d],
"d2": pl.select( pl.lit(d, dtype=pl.Datetime("ms")) ).to_series(),
"d3": pl.select( pl.lit(d, dtype=pl.Datetime("ns","Europe/Berlin")) ).to_series(),
})
# ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ
# β d1 β d2 β d3 β
# β --- β --- β --- β
# β datetime[ΞΌs, Asia/Tokyo] β datetime[ms, Asia/Tokyo] β datetime[ns, Europe/Berlin] β
# ββββββββββββββββββββββββββββͺβββββββββββββββββββββββββββͺββββββββββββββββββββββββββββββ‘
# β 2023-01-01 00:00:00 JST β 2023-01-01 00:00:00 JST β 2023-01-01 00:00:00 CET β
# ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ
Definitely better than the current behaviour, where the given dtype
timezone info gets ignored:
# ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
# β d1 β d2 β d3 β
# β --- β --- β --- β
# β datetime[ΞΌs, Asia/Tokyo] β datetime[ΞΌs, Asia/Tokyo] β datetime[ns, Asia/Tokyo] β
# ββββββββββββββββββββββββββββͺβββββββββββββββββββββββββββͺβββββββββββββββββββββββββββ‘
# β 2023-01-01 00:00:00 JST β 2023-01-01 00:00:00 JST β 2023-01-01 00:00:00 JST β
# ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ
I think I'm in favour of your suggested/expected result. If providing a timezone in the lit
dtype, it does look like "take this value and give it this dtype" - I'm not sure I'd expect implicit conversions either...
Thanks for looking into this
I think this'd be consistent - the other public-facing place I see this being used is in date_range
, where it also casts (rather than doing implicit UTC conversions):
In [3]: pl.date_range(datetime(2022, 1, 1, tzinfo=ZoneInfo('Asia/Kathmandu')), datetime(2022, 1, 2, tzinfo=ZoneInfo('Asi
...: a/Kathmandu')))
Out[3]:
shape: (2,)
Series: '' [datetime[ΞΌs, Asia/Kathmandu]]
[
2022-01-01 00:00:00 +0545
2022-01-02 00:00:00 +0545
]
In [6]: pl.date_range(datetime(2022, 1, 1, tzinfo=ZoneInfo('Asia/Kathmandu')), datetime(2022, 1, 2, tzinfo=ZoneInfo('Asi
...: a/Kathmandu')), time_zone='Europe/London')
---------------------------------------------------------------------------
ValueError: Given time_zone is different from that timezone aware datetimes. Given: 'Europe/London', got: 'Asia/Kathmandu'.
@alexander-beedie here you go https://github.com/pola-rs/polars/pull/6999
Problem description
This follows-up from a discussion started here https://github.com/pola-rs/polars/pull/6991#discussion_r1110963272 , cc @alexander-beedie
In short, what should
pl.select(pl.lit(datetime(2020, 1, 1), dtype=pl.Datetime('us', 'Asia/Kathmandu')))
do?I think the following two should return the same timestamp:
pl.Series(['2020-01-01']).str.strptime(pl.Datetime('us', 'Asia/Kathmandu'))
pl.select(pl.lit(datetime(2020, 1, 1), dtype=pl.Datetime('us', 'Asia/Kathmandu')))
Given
I'd expect
pl.select(pl.lit(datetime(2020, 1, 1), dtype=pl.Datetime('us', 'Asia/Kathmandu')))
to just set the'Asia/Kathmandu'
time zone (as opposed to doing any implicit conversion from UTC)Furthermore, given that the following errors
I'd expect
to also error