non-uniform datetime type

JGuetschow commented 5 days ago

Describe the bug

Depending on the function the format of the returned time coordinates seems to differ. While a dataset created with e.g. pd.daterange() has the type np.datatime64[ns] , the return value of test_ts.coords['time'].loc[{'time': [np.datetime64('1970-01-01')]}] has the datatype np.datetime64[D]

Failing Test Below a test I constructed to show the behaviour. It fails because (I think) the internal date conversion of xr.polyfit and xr.polyval use the format of the np.datetime64 variables to convert the dates to integers used for the actual fitting and evaluation. The example below works if the dates are manually specified as having nanosecond resolution.

import numpy as np
import pandas as pd
import pytest
import xarray as xr
from primap2.tests.utils import assert_aligned_equal

def test_temp_polyval():
    test_ts = xr.DataArray(
        np.linspace(6, 12, 11),
        coords={"time": pd.date_range("1970-01-01", "1980-01-01", freq="YS")},
        dims="time",
        name="test_ts",
    )

    fit = test_ts.polyfit(dim='time', deg=1, skipna=True)
    value = xr.polyval(
        test_ts.coords['time'].loc[{'time': [np.datetime64('1970-01-01')]}],  # specify 'ns' for test to pass
        fit.polyfit_coefficients
    )

    value.name='test_ts' # for assertion

    assert_aligned_equal(
        test_ts.loc[{'time': [np.datetime64('1970-01-01')]}],  # specify 'ns' for test to pass
        value,
        rtol=1e-04,
    )

Expected behavior

It would be good if the date resolution is fixed without having to specify it manually. It would also be good if it's e.g. days and not nanoseconds (but not necessary unless nanoseconds has a negative impact on the fits)

System (please complete the following information):

Linux mint 22

Additional context

In the filling strategy code I'm currently writing I use nanosecond for now, because that seems to be the default behaviour of pd.date_range. But a pr.loc[{'time': ['1999-01-01]}] gives days back, so it's not a general solution to just be specific in the new code.

JGuetschow commented 5 days ago

I think this could also be considered a bug in xarray, unless I'm doing something wrong.

JGuetschow commented 5 days ago

The fix for the test above only works for the specific dates in the test. 1970-01-01 is special as it is zero for the internal time. If we replace 1970-1980 it by e.g. 1956-1966 the test fails.

JGuetschow commented 5 days ago

I found the reason for the problem. It's not the time resolution as I first suspected but a coordinate recomputation in xr.polyfit. The function _floatize_x shifts the time coordinate such that the minimal value is 0 to avoid accuracy problems as np.datetime64 uses 64 bit integers which can not be represented by 64 bit floats with the necessary accuracy. All further computations are done with the shifted coordinates and thus the coefficients are also computed in relation to the shifted coordinates. As 1970 is zero this case worked while other dates did not work. The following code takes the shifting of _floatize_x into account:

def test_temp_polyval():
    test_ts = xr.DataArray(
        np.linspace(6, 12, 11),
        coords={"time": pd.date_range("1956-01-01", "1966-01-01", freq="YS")},
        dims="time",
        name="test_ts",
    )

    time_to_eval = np.datetime64('1957-01-01')

    fit = test_ts.polyfit(dim='time', deg=1, skipna=True)
    value = xr.polyval(
        test_ts.coords['time'].loc[{'time': [time_to_eval]}]-test_ts.coords['time'].data[0],
        fit.polyfit_coefficients
    )

    value.name='test_ts' # for assertion

    assert_aligned_equal(
        test_ts.loc[{'time': [time_to_eval]}],
        value,
        rtol=1e-03,
    )
    return None

I still think this is very weird behaviour. Maybe I missed a hint when reading the docs.

JGuetschow commented 4 days ago

I've opened a second issue for the xr.polyfit problem (#293 )

znichollscr commented 4 days ago

I'd push this upstream to xarray. I can't believe this behaviour is intentional

JGuetschow commented 4 days ago

After thinking a bit about the code in xr.polyfit, I think we should definitely try to move away from nanosecond precision to e.g. hours or minutes. This should improve the accuracy of the fits

primap-community / primap2

non-uniform datetime type #292