Closed ion-elgreco closed 10 months ago
so starting from a tz-aware datetime, and a column of time zones timezone
, you'd like to get the local datetime for each row according to the time zone in the timezone
column
not sure about the name, but a useful way to do this does seem useful. I don't think there's currently any way to do it lazily
@MarcoGorelli correct!
Yeah this pl.when.then trick works only if you know the timezones that the data will contain ahead.
Got a suggestion from @orlp (which I think looks really good):
I think this should be a pair of functions some variation of to_naive and from_naive
Expr.dt.to_naive(tz: pl.Expr)
would return the local time as per the given timezone for each column, resulting in a pl.DateTime(..., time_zone=None)Expr.dt.from_naive(naive_tz: pl.Expr, out_tz: str)
would accept a column of datetimes with a None timezone, where each timestamp is assumed to be in the format by naive_tz and converted to a pl.DateTime(..., time_zone=out_tz)
i.e.:
new_column = pl.col('ts').dt.to_naive('time_zone')
ts | time_zone | new_column
---------------------------------------------------------------------
datetime[us, UTC] | str | datetime[us]
2020-01-01 +0200 | Amsterdam | 2019-12-31 23:00:00
2020-01-01 +0300 | Kathmandu | 2020-01-01 02:45:00
and
new_column = pl.col('ts').dt.from_naive('time_zone', 'UTC')
ts | time_zone | new_column
---------------------------------------------------------------------
datetime[us] | str | datetime[us, UTC]
2019-12-31 23:00:00 | Amsterdam | 2020-01-01 +0200
2020-01-01 02:45:00 | Kathmandu | 2020-01-01 +0200
@MarcoGorelli ah nice one, then we go both directions. Alright, I will add the other one as well.
I still find naive not saying much imho then I think your naive_local is more clear than if we want to mention naive
I still find naive not saying much imho then I think your naive_local is more clear than if we want to mention naive
I think to_local_datetime
is fine too, the comment was just to demonstrate the pair of functions
I'd suggest naming it like that if you prefer it, we can always do a final function name bikeshed before shipping it 😄
@MarcoGorelli is it okay to change the python expr naming only or should it be consistent across the rust and python API?
probably best to be consistent
I like the functionality this adds, but it also overlaps with the existing functionality, which can be confusing.
to_naive(None)
would be the same as replace_time_zone(None)
. And from_naive(None, tz)
would be the same as replace_time_zone(tz)
.
With regards to naming, maybe it can be called remove_time_zone
/add_time_zone
. That makes it clear it's about time zones and that it the input/output will be tz-aware/tz-naive or vice versa. The additional time zone expression can me thought of as "according to this time zone". Also gels well with the existing methods.
@stinodego I would say that's a small overlap :) Both target different use cases
Regarding the naming, we are not only removing or adding a timezone. We rather do convert
to TZ and then replace
TZ with None, and the other way around. Only saying remove or add doesn't make that entirely clear and could give different impressions of what it does.
From all the options my preference would go to the to/from_naive_local_datetime
@MarcoGorelli any input on the naming of these functions? I find it hard to decide. In your presentation at pydata, you made a crystal clear case for when to use replace_time_zone
vs convert_time zone
. I feel like adding these two new methods makes things a bit muddy.
I'd like to look at the 4 method names and know what they do immediately. Maybe they shouldn't be separate methods but parameters instead? I'm not sure. Input is appreciated.
I'm honestly not sure - you're right that it's gone from "2 clear and distinct functions" to "4 functions with some overlap"
I'm tempted to say that there two new ones could be in a Polars plugin instead, so that Polars only has the most commonly used time zone functions (and they can remain simple and clearly named), and the relatively less common case of converting from multiple time zones can be handled there?
Maybe I could rename polars-business
to polars-tse
(Polars Time Series Extras) and we put this in?
@MarcoGorelli happy to migrate it to your plugin
Thanks guys. I'll be closing this then!
This functionality will be available in https://github.com/MarcoGorelli/polars-xdt (see PR https://github.com/MarcoGorelli/polars-xdt/pull/42)
Description
Let's say we have a dataframe with two columns. The current date in UTC and the local timezone. Currently it's not that easy to convert it first the local timezone and then replace the timezone with None without using pl.when.then with a filter.
It would be nice if we could do this in one go:
df.select(local_date=pl.col('date_col').dt.convert_and_replace_timezone(convert=pl.col('timezone'), replace=None)
.This way the end result is in one datetime and tz format but the intermediate step can hold different tz's.
@MarcoGorelli