pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.32k stars 17.81k forks source link

BUG: Timestamp.tz and DatetimeIndex.tz are inconsistent when pytz 2024.2 is installed #59833

Open boschmic opened 3 days ago

boschmic commented 3 days ago

Pandas version checks

Reproducible Example

import pandas as pd
t0 = pd.Timestamp("01-01-2000")
print(repr(t0.tz_localize("CET").tz))
print(repr(pd.DatetimeIndex([t0]).tz_localize("CET").tz))

Issue Description

If pytz = "==2024.2" is installed the example prints

<DstTzInfo 'CET' CET+1:00:00 STD>
<DstTzInfo 'CET' LMT+0:18:00 STD>

Downgrading pytzto 2024.1 resolves this issue.

Expected Behavior

The .tz property should produce the same result regardless if the object is a Timestamp or DatetimeIndex. Hence, I expect this example to print

<DstTzInfo 'CET' CET+1:00:00 STD>
<DstTzInfo 'CET' CET+1:00:00 STD>

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.10.final.0 python-bits : 64 OS : Linux OS-release : 6.8.0-40-generic Version : #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 2.1.1 pytz : 2024.2 dateutil : 2.9.0.post0 setuptools : 69.2.0 pip : 24.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
lithomas1 commented 3 days ago

Thanks for reporting this.

I think we spotted this in our CI for pandas 2.2.x as well.

@mroeschke Sorry to ping but do you know what's going wrong? Looking at pytz's release it looks like all they did was update their tzdata to IANA 2024b. Maybe we have a conflict between pytz and zoneinfo/tzdata?

mroeschke commented 3 days ago

Given the documentation in https://pandas.pydata.org/docs/user_guide/timeseries.html#working-with-time-zones (in the Note below), it seems that the ==2024.2 pytz behavior is more correct then the prior.

I don't exactly recall seeing this in the 2.2.x branch.

Additionally, the pandas main branch already removed support for interpreting CET as a pytz timezone and now will infer this as a zoneinfo timezone, so I'm not sure this is likely to be fixed