pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.58k forks source link

BUG: date_range fails when crossing DST boundary #40336

Open teddyward opened 3 years ago

teddyward commented 3 years ago

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
pd.date_range('2020-03-09 00:00:00-07:00', '2021-03-09 00:00:00-08:00', tz='America/Los_Angeles')
...
TypeError: Start and end cannot both be tz-aware with different timezones

Problem description

The start and end timestamps in the above line of code are both valid timestamps in the America/Los_Angeles timezone, but they fail when passed directly

Expected Output

I would expect this to match the output from the following:

>>> pd.date_range('2020-03-09', '2021-03-09', freq='H', tz='America/Los_Angeles')
DatetimeIndex(['2020-03-09 00:00:00-07:00', '2020-03-09 01:00:00-07:00',
               '2020-03-09 02:00:00-07:00', '2020-03-09 03:00:00-07:00',
               '2020-03-09 04:00:00-07:00', '2020-03-09 05:00:00-07:00',
               '2020-03-09 06:00:00-07:00', '2020-03-09 07:00:00-07:00',
               '2020-03-09 08:00:00-07:00', '2020-03-09 09:00:00-07:00',
               ...
               '2021-03-08 15:00:00-08:00', '2021-03-08 16:00:00-08:00',
               '2021-03-08 17:00:00-08:00', '2021-03-08 18:00:00-08:00',
               '2021-03-08 19:00:00-08:00', '2021-03-08 20:00:00-08:00',
               '2021-03-08 21:00:00-08:00', '2021-03-08 22:00:00-08:00',
               '2021-03-08 23:00:00-08:00', '2021-03-09 00:00:00-08:00'],
              dtype='datetime64[ns, America/Los_Angeles]', length=8762, freq='H')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : f2c8480af2f25efdbd803218b9d87980f416563e python : 3.7.3.final.0 python-bits : 64 OS : Darwin OS-release : 19.6.0 Version : Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.2.3 numpy : 1.16.6 pytz : 2020.1 dateutil : 2.8.1 pip : 20.2.4 setuptools : 47.1.1 Cython : 0.29.21 pytest : 6.0.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.3.7 lxml.etree : 4.5.2 html5lib : None pymysql : None psycopg2 : 2.8.4 (dt dec pq3 ext lo64) jinja2 : 2.11.2 IPython : None pandas_datareader: None bs4 : 4.9.1 bottleneck : None fsspec : 0.7.4 fastparquet : None gcsfs : 0.6.2 matplotlib : 3.3.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : 0.4.2 scipy : 1.5.2 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None numba : 0.50.1
teddyward commented 2 years ago

this bug resurfaced now that it's the fall daylight savings time interim period. Both of the timestamps given below are valid:

>>> pd.date_range('2020-11-02 00:00:00-08:00', '2021-11-02 00:00:00-07:00', tz='America/Los_Angeles')
Traceback (most recent call last):
  File "/Users/teddyward/.virtualenvs/kapi/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 2422, in _infer_tz_from_endpoints
    inferred_tz = timezones.infer_tzinfo(start, end)
  File "pandas/_libs/tslibs/timezones.pyx", line 328, in pandas._libs.tslibs.timezones.infer_tzinfo
AssertionError: Inputs must both have the same timezone, pytz.FixedOffset(-480) != pytz.FixedOffset(-420)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/teddyward/.virtualenvs/kapi/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 1104, in date_range
    **kwargs,
  File "/Users/teddyward/.virtualenvs/kapi/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 419, in _generate_range
    tz = _infer_tz_from_endpoints(start, end, tz)
  File "/Users/teddyward/.virtualenvs/kapi/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 2427, in _infer_tz_from_endpoints
    ) from err
TypeError: Start and end cannot both be tz-aware with different timezones
>>>