pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

polyfit time coordinate treatment in 2024.10.0 #9769

Closed JGuetschow closed 1 day ago

JGuetschow commented 1 day ago

What happened?

When using xr.polyfit on a DataArray with time coordinate (datetime64) the coefficients don't match the data but the data shifted such that t_min = 0. So the coefficients obtained can not be used in xr.polyval directly. This shift happens in _floatize_x (https://github.com/pydata/xarray/blob/91962d6aec380cb83fe80b2afdfa556efdd817a3/xarray/core/missing.py#L585). With xarray 2024.10.0 the problem does not exist.

What did you expect to happen?

I expected the coefficient of the polynomial to be consistent with the data such that they can be used in xr.polyval directly.

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr
import pandas as pd

# with 2024.9.0 numbers are similar, with 2024.10.0 not

test_ts = xr.DataArray(
    np.linspace(6, 12, 11),
    coords={"time": pd.date_range("1956-01-01", "1966-01-01", freq="YS")},
    dims="time",
    name="test_ts",
)

time_to_eval = np.datetime64('1957-01-01')

fit = test_ts.polyfit(dim='time', deg=1, skipna=True)
value = xr.polyval(
    test_ts.coords['time'].loc[{'time': [time_to_eval]}],
    fit.polyfit_coefficients
)

print(f"computed_value: {value.data}")
print(f"expected value: {test_ts.loc[{'time': [time_to_eval]}].data}")

MVCE confirmation

Relevant log output

# xarray 2024.9.0
computed_value: [6.60068707]
expected value: [6.6]

# xarray 2024.10.0
computed_value: [-1.79982014]
expected value: [6.6]

Anything else we need to know?

My original issue in the primap2 package is here: https://github.com/primap-community/primap2/issues/293 But that's just fyi, I think I have included all necessary information here.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0] python-bits: 64 OS: Linux OS-release: 6.8.0-47-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.4 libnetcdf: None xarray: 2024.10.0 pandas: 2.2.3 numpy: 1.26.4 scipy: 1.14.1 netCDF4: None pydap: None h5netcdf: 1.4.0 h5py: 3.12.1 zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.9.2 cartopy: None seaborn: None numbagg: 0.8.2 fsspec: None cupy: None pint: 0.24.3 sparse: None flox: None numpy_groupies: None setuptools: 75.4.0 pip: 24.2 conda: None pytest: None mypy: None IPython: 8.29.0 sphinx: None
keewis commented 1 day ago

Thanks for the report!

Will this be fixed by #9691, or is this a separate issue?

JGuetschow commented 1 day ago

From looking at the tests added in #9691 , I think it will (one test is basically my failing example). I only looked at the open issues and could not find a fitting one. Great that it's already fixed.

keewis commented 1 day ago

Well, now we have one!

I'll close this as fixed by #9691, then.