Open miccoli opened 9 months ago
Another strange edge case, reported in #46819
>>> pd.to_timedelta(2.5225, 's').asm8
numpy.timedelta64(2522499999,'ns')
>>> decimal.Decimal(2.5225)*10**9
Decimal('2522499999.999999964472863212')
but
>>> pd.to_timedelta(2.5223, 's').asm8
numpy.timedelta64(2522300000,'ns')
>>> decimal.Decimal(2.5223)*10**9
Decimal('2522299999.999999986499688021')
Why is 2.5225
seconds truncated to 2522499999
nanoseconds, while 2.5223
seconds rounded up to 2522300000
nanoseconds? Behaviour here seems quite erratic.
The problem should be in line 223–225 and 228 below:
In fact
>>> 0.5225 * 1e9
522499999.99999994
>>> 0.5223 * 1e9
522300000.0
despite the fact that
>>> round(0.5225, 9) == 0.5225
True
I would say that the bug is the naive assumption that
trunc(round(x, p) * 10**p)
give raise to the same results in decimal and binary floating point:
>>> math.trunc(round(0.5225, 6) * 10**6)
522499
>>> math.trunc(round(decimal.Decimal(0.5225), 6) * 10**6)
522500
After some more experimentation I arrived at some preliminary remarks.
to_timedelta
there are two different possible paths:
p == 0
p > 0
So actually we have here two separate issues.
Point 1. could be considered a desing choice: it is open to discussion if other more coherent coherent behaviours (always round, always truncate) are worth the risk of breaking current code.
Point 2. is clearly a bug, which is orthogonal to 1. However, it makes no sense to submit a PR for 2. alone, if also 1. has to be addressed.
Please see my POC implementation with correct rounding but preserving truncation in the ns → ns case: https://github.com/miccoli/pandas/tree/GH%2356629
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
to_timedelta
docs are not clear on how sub nanosecond precision should be rounded/truncated when differentunit
are used in the conversion.Expected Behavior
Equivalent definitions [^1] of timedelta should give raise to the same result. Current behaviour seems to be
unit='ns'
[^1]: Here “equivalent“ is meant within
float
precision, in fact1.75 == 1.75e-3 * 1000
This is very confusing, and can give raise to small glitches in the conversions.
Please note that this behaviour is not due to the fact that decimal literals have no “exact”
float
representation:still fails with
although there is no rounding in
2000 / 1024
and2 / 1024
.Please note also that truncating sub nanosecond precision is not consistent with
datetime.timedelta
which has µs resolution. In fact the docs clearly state thatAnalogoulsy
to_timedelta
and theTimedelta
constructor should always round to the nearest nanosecond using round-half-to-even tiebreaker whenfloat
args are used.Installed Versions