pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.51k stars 17.88k forks source link

Series.dt.round fails if fraction is exactly 0.5 #28408

Closed Els-K closed 5 years ago

Els-K commented 5 years ago

Code Sample, a copy-pastable example if possible

rng = pd.date_range('1/1/2018 11:59:00.050', periods=10, freq='100ms')
rng
>>>DatetimeIndex(['2018-01-01 11:59:00.050000', '2018-01-01 11:59:00.150000',
                  '2018-01-01 11:59:00.250000', '2018-01-01 11:59:00.350000',
                  '2018-01-01 11:59:00.450000', '2018-01-01 11:59:00.550000',
                  '2018-01-01 11:59:00.650000', '2018-01-01 11:59:00.750000',
                  '2018-01-01 11:59:00.850000', '2018-01-01 11:59:00.950000'],
                 dtype='datetime64[ns]', freq='100L')
rng.round('100ms')
>>>DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 11:59:00.200000',
                  '2018-01-01 11:59:00.200000', '2018-01-01 11:59:00.400000',
                  '2018-01-01 11:59:00.400000', '2018-01-01 11:59:00.600000',
                  '2018-01-01 11:59:00.600000', '2018-01-01 11:59:00.800000',
                  '2018-01-01 11:59:00.800000',        '2018-01-01 11:59:01'],
                 dtype='datetime64[ns]', freq=None)

Problem description

If the time fraction is exactly 0.5, rounding half up/half down is applied depending on the digit left of it (even/odd). So, '11:59:00.350' would be rounded half up whilst '11:59:00.450' is rounded half down.

Expected Output

Expected output would be either:

>>>DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 11:59:00.100000',
                  '2018-01-01 11:59:00.200000', '2018-01-01 11:59:00.300000',
                  '2018-01-01 11:59:00.400000', '2018-01-01 11:59:00.500000',
                  '2018-01-01 11:59:00.600000', '2018-01-01 11:59:00.700000',
                  '2018-01-01 11:59:00.800000', '2018-01-01 11:59:00.900000'],
                 dtype='datetime64[ns]', freq=None)

or:

>>> DatetimeIndex(['2018-01-01 11:59:00.100000', '2018-01-01 11:59:00.200000',
                  '2018-01-01 11:59:00.300000', '2018-01-01 11:59:00.400000',
                  '2018-01-01 11:59:00.500000', '2018-01-01 11:59:00.600000',
                  '2018-01-01 11:59:00.700000', '2018-01-01 11:59:00.800000',
                  '2018-01-01 11:59:00.900000',        '2018-01-01 11:59:01'],
                 dtype='datetime64[ns]', freq=None)

Output of pd.show_versions()

[paste the output of ``pd.show_versions()`` here below this line] INSTALLED VERSIONS ------------------ commit : None python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 7 machine : AMD64 processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : en LOCALE : None.None pandas : 0.25.1 numpy : 1.16.4 pytz : 2019.2 dateutil : 2.8.0 pip : 19.2.2 setuptools : 41.0.1 Cython : 0.29.13 pytest : 5.0.1 hypothesis : None sphinx : 2.1.2 blosc : None feather : None xlsxwriter : 1.1.8 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.1 IPython : 7.8.0 pandas_datareader: None bs4 : 4.8.0 bottleneck : 1.2.1 fastparquet : None gcsfs : None lxml.etree : 4.4.1 matplotlib : 3.1.1 numexpr : 2.7.0 odfpy : None openpyxl : 2.6.2 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.3.1 sqlalchemy : 1.3.7 tables : 3.5.2 xarray : 0.12.1 xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.1.8
TomAugspurger commented 5 years ago

I haven't looked at the example, but is this just two's-complement rounding? https://en.wikipedia.org/wiki/Two%27s_complement

datajanko commented 5 years ago

Actually, this is the same as the python rounding behavior

mroeschke commented 5 years ago

This was an intentional change in 0.24.0: https://pandas.pydata.org/pandas-docs/version/0.24.2/whatsnew/v0.24.0.html#datetimelike

Here was the PR with the behavior fix.https://github.com/pandas-dev/pandas/pull/22802

Els-K commented 5 years ago

Hi,

this rounding behavior makes sense in many cases, in particular from a statistical point of view. However, at least for my application in time-series/dataframe analysis (created from several measurement devices that are not 100% synchronized, often several month of data, 1-10Hz sampling frequency), it does not, as my formerly unique timestamps get duplicated. Of course, I can easily solve the problem in my case by adding some time-delta that forces rounding up/down, but it took me a while to understand the problem. I think it would be great to have at least some note in the docs to inform about this behavior, or even better, have an option to choose the kind of rounding, depending on one’s needs.

Best regards, Katharina Elsen

Deutsches Zentrum für Luft- und Raumfahrt (DLR) German Aerospace Center Institut für Physik der Atmosphäre (IPA) / Institute of Atmospheric Physics | Oberpfaffenhofen | 82234 Wessling | Germany

Dr. Katharina Elsen Tel. +49 8153 28-1330| Fax +49 8153 28-1841 | katharina.elsen@dlr.demailto:katharina.elsen@dlr.de | www.dlr.de/ipahttp://www.dlr.de/ipa/

Von: Matthew Roeschke [mailto:notifications@github.com] Gesendet: Donnerstag, 12. September 2019 21:01 An: pandas-dev/pandas Cc: Elsen, Katharina Maria; Author Betreff: Re: [pandas-dev/pandas] Series.dt.round fails if fraction is exactly 0.5 (#28408)

This was an intentional change in 0.24.0: https://pandas.pydata.org/pandas-docs/version/0.24.2/whatsnew/v0.24.0.html#datetimelike

Here was the PR with the behavior fix.#22802https://github.com/pandas-dev/pandas/pull/22802

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/28408?email_source=notifications&email_token=ANFNXRU3YL2YWYIMNMNOOO3QJKGYRA5CNFSM4IWCWC6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6S5FLQ#issuecomment-530961070, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANFNXRR3TEPHOD4DSNPU4ITQJKGYRANCNFSM4IWCWC6A.

TomAugspurger commented 5 years ago

Can you make a pull request updating the docs? Probably in timeseries.rst

On Sep 16, 2019, at 03:18, Els-K notifications@github.com wrote:

Hi,

this rounding behavior makes sense in many cases, in particular from a statistical point of view. However, at least for my application in time-series/dataframe analysis (created from several measurement devices that are not 100% synchronized, often several month of data, 1-10Hz sampling frequency), it does not, as my formerly unique timestamps get duplicated. Of course, I can easily solve the problem in my case by adding some time-delta that forces rounding up/down, but it took me a while to understand the problem. I think it would be great to have at least some note in the docs to inform about this behavior, or even better, have an option to choose the kind of rounding, depending on one’s needs.

Best regards, Katharina Elsen

Deutsches Zentrum für Luft- und Raumfahrt (DLR) German Aerospace Center Institut für Physik der Atmosphäre (IPA) / Institute of Atmospheric Physics | Oberpfaffenhofen | 82234 Wessling | Germany

Dr. Katharina Elsen Tel. +49 8153 28-1330| Fax +49 8153 28-1841 | katharina.elsen@dlr.demailto:katharina.elsen@dlr.de | www.dlr.de/ipahttp://www.dlr.de/ipa/

Von: Matthew Roeschke [mailto:notifications@github.com] Gesendet: Donnerstag, 12. September 2019 21:01 An: pandas-dev/pandas Cc: Elsen, Katharina Maria; Author Betreff: Re: [pandas-dev/pandas] Series.dt.round fails if fraction is exactly 0.5 (#28408)

This was an intentional change in 0.24.0: https://pandas.pydata.org/pandas-docs/version/0.24.2/whatsnew/v0.24.0.html#datetimelike

Here was the PR with the behavior fix.#22802https://github.com/pandas-dev/pandas/pull/22802

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/28408?email_source=notifications&email_token=ANFNXRU3YL2YWYIMNMNOOO3QJKGYRA5CNFSM4IWCWC6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6S5FLQ#issuecomment-530961070, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANFNXRR3TEPHOD4DSNPU4ITQJKGYRANCNFSM4IWCWC6A. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.