Open JGuetschow opened 5 days ago
I think this could also be considered a bug in xarray
, unless I'm doing something wrong.
The fix for the test above only works for the specific dates in the test. 1970-01-01 is special as it is zero for the internal time. If we replace 1970-1980 it by e.g. 1956-1966 the test fails.
I found the reason for the problem. It's not the time resolution as I first suspected but a coordinate recomputation in xr.polyfit
. The function _floatize_x
shifts the time coordinate such that the minimal value is 0 to avoid accuracy problems as np.datetime64
uses 64 bit integers which can not be represented by 64 bit floats with the necessary accuracy. All further computations are done with the shifted coordinates and thus the coefficients are also computed in relation to the shifted coordinates. As 1970 is zero this case worked while other dates did not work. The following code takes the shifting of _floatize_x
into account:
def test_temp_polyval():
test_ts = xr.DataArray(
np.linspace(6, 12, 11),
coords={"time": pd.date_range("1956-01-01", "1966-01-01", freq="YS")},
dims="time",
name="test_ts",
)
time_to_eval = np.datetime64('1957-01-01')
fit = test_ts.polyfit(dim='time', deg=1, skipna=True)
value = xr.polyval(
test_ts.coords['time'].loc[{'time': [time_to_eval]}]-test_ts.coords['time'].data[0],
fit.polyfit_coefficients
)
value.name='test_ts' # for assertion
assert_aligned_equal(
test_ts.loc[{'time': [time_to_eval]}],
value,
rtol=1e-03,
)
return None
I still think this is very weird behaviour. Maybe I missed a hint when reading the docs.
I've opened a second issue for the xr.polyfit
problem (#293 )
I'd push this upstream to xarray. I can't believe this behaviour is intentional
After thinking a bit about the code in xr.polyfit, I think we should definitely try to move away from nanosecond precision to e.g. hours or minutes. This should improve the accuracy of the fits
Describe the bug
Depending on the function the format of the returned time coordinates seems to differ. While a dataset created with e.g.
pd.daterange()
has the typenp.datatime64[ns]
, the return value oftest_ts.coords['time'].loc[{'time': [np.datetime64('1970-01-01')]}]
has the datatypenp.datetime64[D]
Failing Test Below a test I constructed to show the behaviour. It fails because (I think) the internal date conversion of
xr.polyfit
andxr.polyval
use the format of thenp.datetime64
variables to convert the dates to integers used for the actual fitting and evaluation. The example below works if the dates are manually specified as having nanosecond resolution.Expected behavior
It would be good if the date resolution is fixed without having to specify it manually. It would also be good if it's e.g. days and not nanoseconds (but not necessary unless nanoseconds has a negative impact on the fits)
System (please complete the following information):
Linux mint 22
Additional context
In the filling strategy code I'm currently writing I use nanosecond for now, because that seems to be the default behaviour of
pd.date_range
. But apr.loc[{'time': ['1999-01-01]}]
gives days back, so it's not a general solution to just be specific in the new code.