pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.92k stars 18.03k forks source link

BUG: pd.date_range with DST-crossing has incorrect freq attribute #35388

Open rwijtvliet opened 4 years ago

rwijtvliet commented 4 years ago

Code Sample, a copy-pastable example

import pandas as pd
i = pd.date_range('2020-03-28', periods=4, freq='D', tz='Europe/Berlin')
i # DatetimeIndex(['2020-03-28 00:00:00+01:00',  '2020-03-29 00:00:00+01:00', '2020-03-30 00:00:00+02:00', '2020-03-31 00:00:00+02:00'], dtype='datetime64[ns, Europe/Berlin]', freq='D')
i[0] + i.freq == i[1] #True
i[2] + i.freq == i[3] #True
i[1] + i.freq == i[2] #False (!)

Problem description

Variably-spaced timestamps are not handled well, if the variation is caused by DST. The result of i[1] + i.freq is Timestamp('2020-03-30 01:00:00+0200', tz='Europe/Berlin', freq='D'), whereas Timestamp('2020-03-30 00:00:00+0200', tz='Europe/Berlin', freq='D') was expected.

This is in contrast to timestamps where the variation is caused by e.g. months being different lengths, which are handled correctly.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.8.3.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : en LOCALE : de_DE.cp1252 pandas : 1.0.5 numpy : 1.18.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 49.2.0.post20200714 Cython : None pytest : 5.4.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.16.1 pandas_datareader: None bs4 : 4.9.1 bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : 3.2.2 numexpr : None odfpy : None openpyxl : 3.0.4 pandas_gbq : None pyarrow : None pytables : None pytest : 5.4.3 pyxlsb : None s3fs : None scipy : 1.5.0 sqlalchemy : 1.3.18 tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None numba : None
jbrockmendel commented 4 years ago

cc @mroeschke this looks like a bug; i guess for now dti.freq should be None until we get a DayDST offset?

mroeschke commented 4 years ago

Ya generally Timestamps + offsets with DST is generally still not well supported

jbrockmendel commented 4 years ago

edited the title