pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.8k stars 17.98k forks source link

BUG: date_range from tz localized start date is wrong #52801

Closed zljubisic closed 1 year ago

zljubisic commented 1 year ago

Pandas version checks

Reproducible Example

# script.py
import datetime as dt
import pandas as pd
import pytz

print(f"{pd.__version__=}")
print(f"{pytz.__version__=}\n")

tz = "Europe/Stockholm"

# creation of tz localized date
d = dt.datetime(2023,4,1).replace(tzinfo=pytz.timezone(tz))
print(f"{d=}, {str(d)=}\n")

print(f"{d.tzinfo=}, {str(d.tzinfo)=}\n")

# creation of tz localized date range
idx = pd.date_range(start=d, periods=10, freq='H')

# idx additional tz parameter
idx2 = pd.date_range(start=d, periods=10, freq='H', tz=tz)

val = range(len(idx))

print(f"{idx=}\n")
print(f"{idx2=}\n")
print(f"idx == idx2", idx==idx2, '\n')

Output of script.py:

pd.__version__='2.0.0'
pytz.__version__='2023.3'

d=datetime.datetime(2023, 4, 1, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' LMT+0:53:00 STD>), str(d)='2023-04-01 00:00:00+00:53'

d.tzinfo=<DstTzInfo 'Europe/Stockholm' LMT+0:53:00 STD>, str(d.tzinfo)='Europe/Stockholm'

idx=DatetimeIndex(['2023-04-01 01:07:00+02:00', '2023-04-01 02:07:00+02:00',
               '2023-04-01 03:07:00+02:00', '2023-04-01 04:07:00+02:00',
               '2023-04-01 05:07:00+02:00', '2023-04-01 06:07:00+02:00',
               '2023-04-01 07:07:00+02:00', '2023-04-01 08:07:00+02:00',
               '2023-04-01 09:07:00+02:00', '2023-04-01 10:07:00+02:00'],
              dtype='datetime64[ns, Europe/Stockholm]', freq='H')

idx2=DatetimeIndex(['2023-04-01 01:07:00+02:00', '2023-04-01 02:07:00+02:00',
               '2023-04-01 03:07:00+02:00', '2023-04-01 04:07:00+02:00',
               '2023-04-01 05:07:00+02:00', '2023-04-01 06:07:00+02:00',
               '2023-04-01 07:07:00+02:00', '2023-04-01 08:07:00+02:00',
               '2023-04-01 09:07:00+02:00', '2023-04-01 10:07:00+02:00'],
              dtype='datetime64[ns, Europe/Stockholm]', freq='H')

idx == idx2 [ True  True  True  True  True  True  True  True  True  True]

Issue Description

If I create a date_range() with tz aware datetime instead of time 2023-04-01 00:00:00+02:00 as a first idx element I get 2023-04-01 01:07:00+02:00. From where this 01:07:00+02:00 showed up?

If date_range() is created from tz naive datetime, and than later on tz_localize-d, everything works as expected.

Expected Behavior

I believe that date_range() shouldn't change the very first element of idx, so it should be 2023-04-01 00:00:00+02:00. This shows correct result: pd.date_range(start=dt.datetime(2023,4,1), periods=10, freq='H', tz="Europe/Stockholm") (start is not tz aware, but tz is specified)

DatetimeIndex(['2023-04-01 00:00:00+02:00', '2023-04-01 01:00:00+02:00',
               '2023-04-01 02:00:00+02:00', '2023-04-01 03:00:00+02:00',
               '2023-04-01 04:00:00+02:00', '2023-04-01 05:00:00+02:00',
               '2023-04-01 06:00:00+02:00', '2023-04-01 07:00:00+02:00',
               '2023-04-01 08:00:00+02:00', '2023-04-01 09:00:00+02:00'],
              dtype='datetime64[ns, Europe/Stockholm]', freq='H')

Installed Versions

Replace this line with the output of pd.show_versions()

INSTALLED VERSIONS

commit : 478d340667831908b5b4bf09a2787a11a14560c9 python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-1127.el7.x86_64 Version : #1 SMP Tue Mar 31 23:36:51 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : None.None

pandas : 2.0.0 numpy : 1.24.2 pytz : 2023.3 dateutil : 2.8.2 setuptools : 61.2.0 pip : 21.2.4 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None

mroeschke commented 1 year ago

d = dt.datetime(2023,4,1).replace(tzinfo=pytz.timezone(tz))

This is incorrect usage of pytz timezones. You'll need to use pytz.timezone(tz).localize(dt.datetime(...)) instead to get the correct date. Closing as a usage question

zljubisic commented 1 year ago

@mroeschke You are right. If I use pytz as you have suggested, everything works as expected. Thank you very much, and best regards.

PS Stupid chatgpt4

image