pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.64k stars 1.09k forks source link

Inconsistent interpolation based on data typed #8773

Open wilson0028 opened 9 months ago

wilson0028 commented 9 months ago

What happened?

Depending on the data type, interpolate gives different results

What did you expect to happen?

The example code outputs two arrays which are pasted below. The first array end with a one while the seconds array, which is based on the float32 dataset, has all NaN. I was expecting each array to have 6 NaNs and one numerical value.

[nan nan nan nan nan nan 1.] [nan nan nan nan nan nan nan]

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import pandas as pd

time_range = pd.date_range(start='2024-02-20T12', periods=2, freq='6H')

data1 = xr.DataArray([np.nan,1], dims='time', coords={'time': time_range})
data2 = data1.astype('float32')

print(data1.resample({"time":"1H"}).interpolate("linear").values)
print(data2.resample({"time":"1H"}).interpolate("linear").values)

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-513.11.1.el8_9.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development xarray: 2024.1.1 pandas: 2.1.1 numpy: 1.26.2 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.9.2 cupy: None pint: 0.22 sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.2 conda: None pytest: None mypy: None IPython: None sphinx: None
welcome[bot] commented 9 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

max-sixty commented 9 months ago

Not sure what's causing this, but confirm I can reproduce. Any ideas?

mathause commented 9 months ago

Simpler repro (no datetime, no resample) - but I don't know why it happens, either.

import xarray as xr
data1 = xr.DataArray([np.nan,1], dims='x', coords={'x': [0, 6]})
data2 = data1.astype('float32')
target = [0, 6]
data1.interp(x=target)
data2.interp(x=target)
mathause commented 9 months ago

Ok this is a scipy problem - do you want to raise a issue in scipy?

import scipy as sp
import numpy as np
xi = np.array([0, 6])
yi = np.array([np.nan, 1])
sp.interpolate.interp1d(xi, yi, kind="linear")(xi)
sp.interpolate.interp1d(xi, yi.astype(np.float32), kind="linear")(xi)

(the xarray question here is - why do we choose to interpolate using scipy and not numpy?)

wilson0028 commented 9 months ago

Issue now posted on scipy repo: https://github.com/scipy/scipy/issues/20152

wilson0028 commented 9 months ago

From the scipy documentation, "We note that scipy.interpolate does not support interpolation with missing data. Two popular ways of representing missing data are using masked arrays of the numpy.ma library, and encoding missing values as not-a-number, NaN."

If scipy does not support interpolation of missing data, by extension xarray does not also?