Open cbertinato opened 5 years ago
The root of the issue goes back to cast_from_unit
in https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslibs/timedeltas.pyx#L275, so it could very well affect other types. This is a classical case of representation error, but I think that we should represent the fractional second that was intended.
One thought is to use the decimal module here:
timestamp = Decimal(str(ts))
base = Decimal(<int64_t>ts)
frac = timestamp - base
It does the job, but there is a performance penalty. Currently:
%timeit pd.to_timedelta(pd.Series([ts]*10000), unit='s')
28.1 ms ± 873 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
versus using Decimal
:
%timeit pd.to_timedelta(pd.Series([ts]*10000), unit='s')
44.9 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Maybe an artifact of https://github.com/pandas-dev/pandas/pull/19732 that was chalked up to floating point precision errors. We have a related warning here: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#epoch-timestamps
It would be great to have a more precise calculation here, but the performance impact of using Decimal
is not too attractive.
Code Sample
However, this works:
Problem description
Depending upon the value of the float to be converted to a timestamp or timedelta and on the unit, the resulting timestamp and timedelta will occasionally misrepresent the fractional part of the input float.
Expected Output
Timedelta('17247 days 14:40:30.8')
Output of
pd.show_versions()