pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Optimize idxmin, idxmax with dask #9800

Open dcherian opened 2 days ago

dcherian commented 2 days ago

cc @phofl here we need to index a numpy array with a dask array (commonly a much larger array) in a sane manner.

We now preserve chunksizes for

import numpy as np
import xarray as xr

# create some dummy data and chunk
x, y, t = 1000, 1000, 57
rang = np.arange(t*x*y)
da = xr.DataArray(rang.reshape(t, x, y), coords={'time':range(t), 'x': range(x), 'y':range(y)})
da = da.chunk(dict(time=-1, x=256, y=256))
da.idxmin('time')

After

image

Before

image