pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.09k forks source link

Composite operations on chunked array types not dispatched through chunk manager #9523

Open jeremiah-corrado opened 1 month ago

jeremiah-corrado commented 1 month ago

What happened?

The arkouda-xarray chunk manager (found here) is designed to allow Arkouda's distributed arrays to operate as chunked arrays in XArray. (Note Arkouda's array's also implement the array API, which was being used for XArray interoperability before the chunk manager was created).

When executing various operations on XArray DataArrays or DataSets that are backed by the Arkouda chunk manager, I noticed that they are not being routed through the chunk manager itself. For example, calling myArkoudaBackedArray.mean(axis="some-axis") does not get routed through ArkoudaManager.reduction. However, the operation is successful and produces the correct result, indicating that the array-api implementation of mean is being used instead (i.e., the arkouda-backed array is being treated as a duck array).

https://github.com/jeremiah-corrado/arkouda-xarray-seasonal-avgs-example/blob/main/README.md contains a full example where I observed the above behavior. It includes instructions to get Arkouda set up for XArray interoperability.

pinging @TomNicholas, since we discussed this the other day.

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2024.5.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.8.0 zarr: None cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.4 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 70.0.0 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.24.0 sphinx: None
keewis commented 1 month ago

looking at the docstring of the chunk manager base class, it appears this is not used for all reductions: https://github.com/pydata/xarray/blob/d26144db962d9d620c7f7c4e7398c2f6b3bf6d47/xarray/namedarray/parallelcompat.py#L386-L390

In other words, this seems to be expected behavior? I wonder if var.reduce(np.mean) would go through the reduction method, though.

TomNicholas commented 1 month ago

In other words, this seems to be expected behavior?

Reminding myself of how this works, I think yes, this is expected. The array type knows how to do a chunked mean on a chunked array without any further guidance from xarray, so it's fine to just use the array API to call the correct implementation of mean.

I wonder if var.reduce(np.mean) would go through the reduction method, though.

It won't (see NamedArray.reduce)

https://github.com/pydata/xarray/blob/cde720f21788a1dbff2bdb0775a56b3ee6da6b1a/xarray/namedarray/core.py#L914

but I also don't think it needs to either. reduce is really a special case of the more general reduction primitive. reduction accepts the func kwarg that reduce accepts, but also accepts combine_func and aggregate_func. Basically these allow tree-reductions made of alternating steps, rather than just the same function applied over and over. This comes from dask.array.reduction, and is needed in the chunkmanager because it is used in xarray (only once though apparently, in the implementation .first and .last methods).

https://github.com/pydata/xarray/blob/cde720f21788a1dbff2bdb0775a56b3ee6da6b1a/xarray/core/duck_array_ops.py#L795

If arkouda chose not to implement ArkoudaChunkManager.reduction I think the only thing that would break would be the .first and .last methods. .reduce would in general work, but only if the function you passed into .reduce knew how to handle the Arkouda Array type.

So actually other than documenting / testing this a bit better I believe there is not actually a bug to be fixed here?

jeremiah-corrado commented 1 month ago

I think the only thing that would break would be the .first and .last methods. .reduce would in general work, but only if the function you passed into .reduce knew how to handle the Arkouda Array type.

Okay that makes sense. Thank you both for digging into the implementation/docs to figure that out. I'll work on implementing reduction properly in Arkouda's chunk manager so that those methods work.

I believe there is not actually a bug to be fixed here?

I think that's correct