pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.61k stars 1.08k forks source link

Interpolation - Support extrapolation method "clip" #3962

Open Illviljan opened 4 years ago

Illviljan commented 4 years ago

Hello,

I would like an option in da.interp()that instead of returning NaNs during extrapolation returns the data corresponding to the end of the breakpoint data set range.

One way to do this is to limit the new coordinates to the array coordinates minimum and maximum value, I did a simple example with this solution down below. I think this is a rather safe way as we are just modifying the inputs to all the various interpolation classes that xarray is using at the moment. But it does look a little weird when printing the extrapolated value, the coordinates shows the limited value instead of the requested coordinates. Maybe this can be handled elegantly somewhere in the source code?

MATLAB uses this quite frequently in their interpolation functions:

MCVE Code Sample

import numpy as np
import xarray as xr

def interp(da, coords, extrapolation='clip'):
    """
    Linear interpolation that clips the inputs to the coords min and max value.

    Parameters
    ----------
    da : DataArray
        DataArray to interpolate.
    coords : dict
        Coordinates for the interpolated value.
    """
    if extrapolation == 'clip':
        for k, v in da.coords.items():
            coords[k] = np.maximum(coords[k], np.min(v.values))
            coords[k] = np.minimum(coords[k], np.max(v.values))

    return da.interp(coords)

# Create coordinates:
x = np.linspace(1000, 6000, 4)
y = np.linspace(100, 1200, 3)

# Create data:
X = np.meshgrid(*[x, y], indexing='ij')
data = X[0] * X[1]

# Create DataArray:
da = xr.DataArray(data=data, coords=[('x', x), ('y', y)], name='data')

# Attempt to extrapolate:
datai = interp(da, {'x': 7000, 'y': 375})

Expected Output

print(datai)
<xarray.DataArray 'data' ()>
array(2250000.)
Coordinates:
    x        float64 6e+03
    y        float64 375.0

Versions

Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, Mar 23 2020, 23:19:08) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None libhdf5: 1.10.4 libnetcdf: None xarray: 0.15.0 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.13.0 distributed: 2.13.0 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 46.1.3.post20200330 pip: 20.0.2 conda: 4.8.3 pytest: 5.4.1 IPython: 7.13.0 sphinx: 2.4.4
dcherian commented 4 years ago

Scipy will do this for you if you ask it to:

da.interp(x=7000, y=375, method="linear", kwargs=dict(fill_value=None))

See https://github.com/pydata/xarray/pull/3956 .

Illviljan commented 4 years ago

Nice find! The documentation doesn't explain that at all currently.

But that solution doesn't work for 1d DataArrays. You have to use this kwargs instead:

kwargs = dict(fill_value=(np.min(da.data), np.max(da.data)))

So it still isn't as convenient as I would like it to be. The wrapper function can now be simplified to this however:

import numpy as np
import xarray as xr

def interp(da, coords, extrapolation='clip'):
    """
    Linear interpolation that clips the inputs to the coords min and max value.

    Parameters
    ----------
    da : DataArray
        DataArray to interpolate.
    coords : dict
        Coordinates for the interpolated value.
    """
    if extrapolation == 'clip':
        if len(da.coords) > 1:
            kwargs = dict(fill_value=None)
        else:
            kwargs = dict(fill_value=(np.min(da.data), np.max(da.data)))
    return da.interp(coords, kwargs=kwargs)

# Create coordinates:
x = np.linspace(1000, 6000, 4)
y = np.linspace(100, 1200, 3)
z = np.linspace(1, 2, 2)

# Create 1D DataArray:
da1 = xr.DataArray(data=2*x, coords=[('x', x)])

# Create 2D DataArray:
X = np.meshgrid(*[x, y], indexing='ij')
data = X[0] * X[1]
da2 = xr.DataArray(data=data, coords=[('x', x), ('y', y)])

# Create 3D DataArray:
X = np.meshgrid(*[x, y, z], indexing='ij')
data = X[0] * X[1] * X[2]
da3 = xr.DataArray(data=data, coords=[('x', x), ('y', y), ('z', z)])

# Attempt to extrapolate:
print(interp(da1, {'x': 7000}))
print(interp(da2, {'x': 7000, 'y': 375}))
print(interp(da3, {'x': 7000, 'y': 375, 'z': 1}))
dcherian commented 4 years ago

As noted in #3956 fill_value="extrapolate" is what we need for 1D.

xarray could use the same interpolation function for both 1d and nd.

Not sure if there's a big performance gain. @jhamman implemented these interpolators originally AFAIK. Perhaps he remembers why we use both.

stale[bot] commented 2 years ago

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically