pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.58k forks source link

BUG: NonExistentTimeError with resample("7D"), but only when DST day is not part of the date range #58380

Open kdebrab opened 2 months ago

kdebrab commented 2 months ago

Pandas version checks

Reproducible Example

import pandas as pd

ts = pd.Series(1, pd.date_range("2024-04-19", "2024-04-20", tz="Africa/Cairo", freq="15min"))
ts.resample("7D").sum()

Issue Description

Above fails with pytz.exceptions.NonExistentTimeError: 2024-04-26 00:00:00 even though that date falls outside the date range.

Strangely, when the end date is later, and thus the problematic date is included in the result, it doesn't fail anymore:

ts = pd.Series(1, pd.date_range("2024-04-19", "2024-04-27", tz="Africa/Cairo", freq="15min"))
ts.resample("7D").sum()

returns:

2024-04-19 00:00:00+02:00    672
2024-04-26 01:00:00+03:00     93
Freq: 7D, dtype: int64

Expected Behavior

No failure and the correct result, namely:

2024-04-19 00:00:00+02:00    97
Freq: 7D, dtype: int64

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.10.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Belgium.1252 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 69.5.1 pip : 23.3.1 Cython : None pytest : 7.4.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.1.2 lxml.etree : 5.2.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.4.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : 1.3.8 dataframe-api-compat : None fastparquet : 2023.2.0 fsspec : 2024.3.1 gcsfs : None matplotlib : 3.7.5 numba : None numexpr : 2.10.0 odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 15.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.0 sqlalchemy : 2.0.28 tables : None tabulate : 0.9.0 xarray : 2024.3.0 xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
MarognaLorenzo commented 2 months ago

take

kdebrab commented 2 months ago

FYI, the same error happens when resampling to daily resolution when the last day is the day before the DST:

ts = pd.Series(1, pd.date_range("2024-04-19", "2024-04-25", tz="Africa/Cairo", freq="15min"))
ts.resample("D").sum()

It doesn't fail anymore as soon as the end is later, e.g.:

ts = pd.Series(1, pd.date_range("2024-04-19", "2024-04-27", tz="Africa/Cairo", freq="15min"))
ts.resample("D").sum()