pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.65k stars 1.09k forks source link

odd multidimensional extrapolation behaviour with interp #9261

Open tom-new opened 4 months ago

tom-new commented 4 months ago

What happened?

When using interp on a 3-dimensional DataArray to extract values on a 2-dimensional surface, there is no extrapolation along one of the dimensions of the 2D plane (but the other dimension works). In the MWE, I have created a 5x5x5 cube centred on the origin where x, y, and z run from -2 to 2, then interpolated along a 6x6 plane that is perpendicular to the y-axis, where the plane extends beyond the faces of the cube by 0.5 units. The values for which x are outside the cube extrapolate correctly, however the values for which z are outside the cube become nan.

What did you expect to happen?

Extrapolation should happen along both dimensions of the plane.

Minimal Complete Verifiable Example

>>>import numpy as np
>>>import xarray as xr
>>>x = np.linspace(-2,2,5)
>>>da = xr.DataArray(np.random.rand(5,5,5), coords={"x": x, "y": x, "z": x})
>>>xi = xr.DataArray(np.linspace(-2.5,2.5,6), dims="theta")
>>>yi = xr.DataArray(0.5*np.ones(6), dims="theta")
>>>zi = np.linspace(-2.5,2.5,6)
>>>da.interp(x=xi, y=yi, z=zi, kwargs={"fill_value": None})
<xarray.DataArray (theta: 6, z: 6)> Size: 288B
array([[       nan, 0.65509617, 0.65540961, 0.94953277, 0.96035659,
               nan],
       [       nan, 0.50551769, 0.50022194, 0.55698788, 0.47609523,
               nan],
       [       nan, 0.53349845, 0.55660868, 0.45080263, 0.36583418,
               nan],
       [       nan, 0.6058017 , 0.62612734, 0.49364122, 0.52795018,
               nan],
       [       nan, 0.67642216, 0.63931782, 0.56366723, 0.52998289,
               nan],
       [       nan, 0.87859657, 0.79462258, 0.79821649, 0.47355559,
               nan]])
Coordinates:
    x        (theta) float64 48B -2.5 -1.5 -0.5 0.5 1.5 2.5
    y        (theta) float64 48B 0.5 0.5 0.5 0.5 0.5 0.5
  * z        (z) float64 48B -2.5 -1.5 -0.5 0.5 1.5 2.5
Dimensions without coordinates: theta

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

No response

Environment

/opt/homebrew/Caskroom/miniforge/base/envs/geopython/lib/python3.11/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 23.4.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2 xarray: 2024.5.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.1 netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: 3.11.0 zarr: None cftime: 1.6.4 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.4 cartopy: 0.23.0 seaborn: 0.13.2 numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 70.0.0 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.25.0 sphinx: None
welcome[bot] commented 4 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

Illviljan commented 4 months ago

I suspect the trick of replacing the x, y dims with theta is not well tested. Is this supported?

The problem is that interpolation is done 2 times, with 2 different interpolants:

<function interpn at 0x0000020840209120> - fill_value=None for extrapolation https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interpn.html 
<class 'xarray.core.missing.ScipyInterpolator'>  - fill_value="extrapolate" for extrapolation https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html

Which is not so easy when you can only choose 1 fill_value...

The problem starts around here: https://github.com/pydata/xarray/blob/10bb94c28b639369a66106fb1352b027b30719ee/xarray/core/missing.py#L633-L636 not sure why we get 2 interpolation loops there, does it make sense? Maybe decompose_interp is buggy?

Not sure if this solves your actual problem, but I don't understand why you want to repeat yi 6 times. Instead, use a single value:

import numpy as np
import xarray as xr

x = np.linspace(-2, 2, 5)

da = xr.DataArray(
    np.arange(5 * 5 * 5).reshape(5, 5, 5), coords={"x": x, "y": x, "z": x}
)

xi = xr.DataArray(np.linspace(-2.5, 2.5, 6), dims="x")
yi = xr.DataArray([0.5], dims="y")
zi = xr.DataArray(np.linspace(-2.5, 2.5, 6), dims="z")

s = da.interp(x=xi, y=yi, z=zi, kwargs={"fill_value": "extrapolate"})
s = s.stack(theta=("x", "y")) # or s = s.squeeze()
tom-new commented 4 months ago

Yes, it is supported. It's the way the tutorials suggest to interpolate along a lower-dimensional space than the parent data (in the tutorial, x and y are associated with a common dimension name z to interpolate 2D data along a 1D line).

The reason I repeated yi 6 times is because in general yi may not just be one value—I chose it for the MWE for ease of mental visualisation. In my actual problem, I am trying to interpolate along a curve in x and y that is then projected down through z, resulting in a curved surface through the space (imaging bending a sheet of paper). I don't think this is really relevant to the bug though, but maybe I'm misunderstanding something.