pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.67k stars 17.92k forks source link

BUG: PeriodIndex looses tz information without emitting a warning that the timezone is dropped. #47005

Open joooeey opened 2 years ago

joooeey commented 2 years ago

Pandas version checks

Reproducible Example

import pandas as pd

pd.period_range("2022-01-01 06:00:00+02:00", "2022-01-01 09:00:00+02:00", freq="H")

Out[17]: 
PeriodIndex(['2022-01-01 06:00', '2022-01-01 07:00', '2022-01-01 08:00',
             '2022-01-01 09:00'],
            dtype='period[H]')

Issue Description

Creating a PeriodIndexwith pd.period_range with timezone-aware inputs creates a timezone-naive PeriodIndex without even emmitting a warning that the timezone is dropped.

This appears to have the same root cause as #28039 and #21333. There is a fix for #21333 in #22549 (i.e. we warn now) but to me that seems too far upstream. This needs to be fixed in the PeriodIndex constructor.

Expected Behavior

The good option, analogous to DatetimeIndex:

Installed Versions

INSTALLED VERSIONS ------------------ commit : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5 python : 3.9.7.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-97-generic Version : #110-Ubuntu SMP Thu Jan 13 18:22:13 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.3.4 numpy : 1.19.5 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.4 setuptools : 58.0.4 Cython : 0.29.24 pytest : 6.2.5 hypothesis : None sphinx : 4.2.0 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.3 IPython : 7.30.1 pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.5.0 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.1 sqlalchemy : None tables : None tabulate : None xarray : 0.18.2 xlrd : None xlwt : None numba : None
simonjayhawkins commented 2 years ago

Thanks @joooeey for the report.

Expected Behavior

The good option, analogous to DatetimeIndex:

  • Whenever a PeriodIndex is created with all arguments in the same time-zone like in the example above, the resulting PeriodIndex acquires that time zone.

There is quite some discussion in #45736 regarding supporting timezone-aware Period.

  • The lazy option, which I consider urgent to implement: Whenever a PeriodIndex is created with at least one time-zone aware argument, a warning is raised saying that the result will be time-zone naive. This is crucial to avoid bugs. Note Whatever solution we go with, this should be fixed in the constructor of PeriodIndex not further up in pd.period_range, Timestamp.to_period or similar.

To avoid duplicating discussion in more than one place, this issue could be left open if we consider this the way forward to close this issue.