pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

perf improvement for interp: set `assume_sorted` automatically #9758

Open dcherian opened 1 week ago

dcherian commented 1 week ago

What is your issue?

assume_sorted is False, so for vectorized interpolation across multiple dimensions, we end up lexsorting the coordinates all the time. For some reason, this can be quite slow with dask.

https://github.com/pydata/xarray/blob/6df8bd606a8a9a3378c7672c087e08ced00b2e15/xarray/core/dataset.py#L4081

Instead we should be able to do

obj = self
# sort by slicing if we can
for coord in set(indexers) and set(self._indexes):
    # TODO: better check for PandasIndex
    if self.indexes[coord].is_monotonic_decreasing:
        obj = obj.isel(coord: slice(None, None, -1))

# TODO: make None the new default
if assume_sorted is None:
    # TODO: dims without coordinates are fine too
    assume_sorted = all(self.indexes[coord].is_monotonic_increasing for coord in indexers)

I'll add a reproducible example later, but the problem I've been playing gets much faster for graph construction:

image

xref #6799

cc @mpiannucci @Illviljan