pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Temporal polyfit does not use the same coordinate as polyval #9690

Closed aulemahal closed 3 weeks ago

aulemahal commented 4 weeks ago

What happened?

When fitting a variable along a temporal dimension, results from computing the fitted "trend" with xr.polyval were very different from the original variable. It seems that the function is not evaluated on the same coordinate the fit was made on.

What did you expect to happen?

The fit should be made on the same coordinate as the evaluation.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt

da = xr.DataArray(np.arange(100), dims=('time',), coords={'time': xr.date_range('2001-01-01', periods=100, freq='YS')})

fit = da.polyfit('time', deg=3)

val = xr.polyval(da.time, fit.polyfit_coefficients)

print(da[0], val[0])
# 0,  31.00174342
# The data is linear, the fit should be exact.

plt.plot(da.time, da, label='Original')
plt.plot(da.time, val, label='Fit')

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

xr.polyval transforms the temporal coordinate here : https://github.com/pydata/xarray/blob/dbb98b4a40d1679800f7f85d0dad59ef60b5b790/xarray/core/computation.py#L2070

And da.polyfit does that here : https://github.com/pydata/xarray/blob/dbb98b4a40d1679800f7f85d0dad59ef60b5b790/xarray/core/dataset.py#L9069-L9077

The results are similar, however, in polyfit the new x starts at 0, while this offsetting is not done in polyval.

This bug was introduced in #9369 I believe. I added a review there woops :sweat:. Would it be possible to use the same function in both polyfit and polyval ?

I can make a PR. My intuition would be to use xr.core.computation._ensure_numeric in da.polyfit.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 16:05:46) [GCC 13.3.0] python-bits: 64 OS: Linux OS-release: 6.11.5-200.fc40.x86_64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: fr_CA.UTF-8 LOCALE: ('fr_CA', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2024.10.0 pandas: 2.2.3 numpy: 2.0.2 scipy: 1.14.1 netCDF4: 1.7.1 pydap: None h5netcdf: 1.4.0 h5py: 3.12.1 zarr: None cftime: 1.6.4 nc_time_axis: 1.4.1 iris: None bottleneck: 1.4.2 dask: 2024.10.0 distributed: 2024.10.0 matplotlib: 3.9.2 cartopy: None seaborn: None numbagg: None fsspec: 2024.10.0 cupy: None pint: 0.24.3 sparse: 0.16.0a10.dev3+ga73b20d flox: 0.9.12 numpy_groupies: 0.11.2 setuptools: 75.1.0 pip: 24.2 conda: None pytest: 8.3.3 mypy: 1.13.0 IPython: 8.29.0 sphinx: 8.1.3