pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.76k stars 17.96k forks source link

BUG: period[h] + column splicing is not work #60273

Open oxygenbilly opened 3 days ago

oxygenbilly commented 3 days ago

Pandas version checks

Reproducible Example

pr = pd.period_range('2024-01-01 00:00:00', '2024-01-01 02:00:00', freq='h')
df = pd.DataFrame(index=pr)
df['date'] = df.index.to_timestamp().floor('D')
df['hour'] = df.index.hour
df.index.name = 'value'
df = df.reset_index()
df = df.pivot(index='date', columns='hour', values='value')

print(df)
# hour                       0                 1                 2
# date                                                            
# 2024-01-01  2024-01-01 00:00  2024-01-01 01:00  2024-01-01 02:00

print(df[[0,1,2]])
# hour                       0                 1                 2
# date                                                            
# 2024-01-01  2024-01-01 00:00  2024-01-01 00:00  2024-01-01 00:00

Issue Description

when the datatype is period[h], the slicing will not produce the correct results. if the datatype is changed to object. result is correct

Expected Behavior

The expected behavior is to slice the column properly based on hour [0, 1, 2]. However the above results give all [0] column for all of [0, 1, 2]

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.10.12.final.0 python-bits : 64 OS : Linux OS-release : 6.1.85+ Version : #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.2 dateutil : 2.8.2 setuptools : 75.1.0 pip : 24.1.2 Cython : 3.0.11 pytest : 7.4.4 hypothesis : None sphinx : 5.0.2 blosc : None feather : None xlsxwriter : None lxml.etree : 5.3.0 html5lib : 1.1 pymysql : None psycopg2 : 2.9.10 jinja2 : 3.1.4 IPython : 7.34.0 pandas_datareader : 0.10.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : 1.4.2 dataframe-api-compat : None fastparquet : None fsspec : 2024.10.0 gcsfs : 2024.10.0 matplotlib : 3.8.0 numba : 0.60.0 numexpr : 2.10.1 odfpy : None openpyxl : 3.1.5 pandas_gbq : 0.24.0 pyarrow : 17.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.1 sqlalchemy : 2.0.36 tables : 3.8.0 tabulate : 0.9.0 xarray : 2024.10.0 xlrd : 2.0.1 zstandard : None tzdata : 2024.2 qtpy : None pyqt5 : None
rhshadrach commented 3 days ago

Thanks for the report! Confirmed on main. Further investigations and PRs to fix are welcome!

DhruvBShetty commented 2 days ago

Hello, @oxygenbilly, I would like to investigate this issue, if you aren't working on it.