Open martheveldhuis opened 8 months ago
This works as a workaround:
df_nl["B"] = df_nl["A"].shift(freq=pd.Timedelta("1h"))
Ideally shift
would handle pd.DateOffset
DST jumps.
Another example:
pd.Timestamp("2024-04-25", tz="Africa/Cairo") + pd.DateOffset(days=1)
which raises
pytz.exceptions.NonExistentTimeError: 2024-04-26 00:00:00
I think the best solution would be to add the options 'nonexistent' and 'ambiguous' to pd.DateOffset
(similar as we have for e.g. the floor method), such that one can do:
pd.Timestamp("2024-04-25", tz="Africa/Cairo") + pd.DateOffset(days=1, nonexistent="shift_forward", ambiguous=False)
and get as result:
Timestamp('2024-04-26 01:00:00+0300', tz='Africa/Cairo')
I think that having this capability will also make it easier to resolve bugs like #58380 and #51211.
This works as a workaround:
df_nl["B"] = df_nl["A"].shift(freq=pd.Timedelta("1h"))
Ideally
shift
would handlepd.DateOffset
DST jumps.
Unfortunately, this doesn't solve the issue. I will provide a better example:
start_date = pd.to_datetime("2024-03-30 07:00:00").tz_localize("Europe/Amsterdam")
end_date = start_date + timedelta(weeks=1)
datetime_index = pd.date_range(start=start_date, end=end_date, freq="h")
df = pd.DataFrame({"A": range(len(datetime_index))}, index=datetime_index)
df["B"] = df["A"].shift(freq=pd.Timedelta(weeks=1))
print(df)
Which outputs:
A B
2024-03-30 07:00:00+01:00 0 NaN
2024-03-30 08:00:00+01:00 1 NaN
2024-03-30 09:00:00+01:00 2 NaN
2024-03-30 10:00:00+01:00 3 NaN
2024-03-30 11:00:00+01:00 4 NaN
... ... ...
2024-04-06 04:00:00+02:00 164 NaN
2024-04-06 05:00:00+02:00 165 NaN
2024-04-06 06:00:00+02:00 166 NaN
2024-04-06 07:00:00+02:00 167 NaN
2024-04-06 08:00:00+02:00 168 0.0
[169 rows x 2 columns]
Even though I would expect:
A B
2024-03-30 07:00:00+01:00 0 NaN
2024-03-30 08:00:00+01:00 1 NaN
2024-03-30 09:00:00+01:00 2 NaN
2024-03-30 10:00:00+01:00 3 NaN
2024-03-30 11:00:00+01:00 4 NaN
... ... ...
2024-04-06 04:00:00+02:00 164 NaN
2024-04-06 05:00:00+02:00 165 NaN
2024-04-06 06:00:00+02:00 166 NaN
2024-04-06 07:00:00+02:00 167 0.0
2024-04-06 08:00:00+02:00 168 1.0
[169 rows x 2 columns]
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
This last line gives an error:
pytz.exceptions.NonExistentTimeError: 2024-03-31 02:00:00
With full traceback:
Expected Behavior
This would be the desired ouput:
The point of converting a UTC timeseries to Europe/Amsterdam time is that I want to look up behaviour of people, which stays consistent to their timezone. E.g. if someone goes to work every day at 08:00, that remains at 08:00 in their timezone, even after the daylight savings shift. In UTC, that person appears to leave one hour earlier (at 07:00). By converting to Europe/Amsterdam time, then shifting, this should be handled correctly.
Installed Versions