pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.87k stars 18.02k forks source link

BUG: rounding dates to 30mins does not work #57002

Open Andre-Medina opened 10 months ago

Andre-Medina commented 10 months ago

Pandas version checks

Reproducible Example

import pandas as pd

dates = pd.Series([pd.to_datetime("2022-01-11 12:10:00")])

# date should be rounded but is not
dates.round("30min")
# >0   2022-01-11 12:10:00
# >dtype: datetime64[ns]

# Round just date works
dates[0].round("30min")
# >Timestamp('2022-01-11 12:00:00')

Issue Description

Issue with the latest version of pandas (2.2.0). Rounding dates at 30 min increments does not round dates if they are in a series. This was working in pandas (2.1.4)

image

Expected Behavior

Dates should be rounded to 30 minute increments

import pandas as pd

dates = pd.Series([pd.to_datetime("2022-01-11 12:10:00")])

# Round just date works
dates[0].round("30min")

Installed Versions

INSTALLED VERSIONS ------------------ commit : 53525ea1c333579ee612244ddea4958d900844fc python : 3.10.0.final.0 python-bits : 64 OS : Linux OS-release : 5.15.133.1-microsoft-standard-WSL2 Version : #1 SMP Thu Oct 5 21:02:42 UTC 2023 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 3.0.0.dev0+152.g53525ea1c3 numpy : 1.26.2 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.1 Cython : None pytest : 7.4.3 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.1.9 lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.18.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2023.12.2 gcsfs : 2023.12.2post1 matplotlib : 3.8.2 numba : 0.58.1 numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 14.0.1 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
MarcoGorelli commented 10 months ago

thanks @Andre-Medina for the report, this needs fixing, running a git bisect now

MarcoGorelli commented 10 months ago

From git bisect, this was introduced in #56767, @phofl

phofl commented 10 months ago

I am not sure that it is sensible to fix this, the new behavior aligns with DataFrame, which didn't support this before either.

dates.dt.round("30min")

That's what you want to do, we can fix this theoretically, but we should address the dataframe behavior as well then

valeriupredoi commented 7 months ago

hello :panda_face: folks, you guys gonna fix this in 2.2.3 please? Cheers muchly :beer:

Andre-Medina commented 5 months ago

re-raised this issue as it is still broken.