pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

Hooks for XArray operations #1938

Open hameerabbasi opened 6 years ago

hameerabbasi commented 6 years ago

In hope of cleaner dask and sparse support (pydata/sparse#1), I wanted to suggest hooks for XArray operations.

Something like the following:

try:
    import dask.array as da
    xarray.hooks.register('nansum', da.array, da.nansum)
    ...
except ImportError:
    pass

try:
    import sparse.SparseArray
    xarray.hooks.register('nansum', sparse.SparseArray, sparse.nansum)
    ...
except ImportError:
    pass

Functions would work something like the following: (the register would fall back to Numpy if nothing is found)

I would argue that this should be in Numpy, but it's a huge project to put it there.

mrocklin commented 5 years ago

@jacobtomlinson got things sorta-working with NEP-18 and CuPy in an afternoon in Iris (with a strong emphasis on "kinda").

On the CuPy side you're fine. If you're on NumPy 1.16 you'll need to enable the __array_function__ interface with the following environment variable:

export NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1

If you're using Numpy 1.17 then this is on by default.

I think that most of the work here is on the Xarray side. We'll need to remove things like explicit type checks.

hameerabbasi commented 5 years ago

@rabernat I can attend remotely.

shoyer commented 5 years ago

We're at the point where this could be hacked together pretty quickly:

  1. We need to remove the explicit casting to NumPy arrays (ala https://github.com/pydata/xarray/pull/2956). Checking for an __array_function__ attribute is probably a good heuristic for duck arrays (it's what dask is using).
  2. Internally, we need to use NumPy functions directly (if __array_function__ is enabled) instead of our current Dask/NumPy versions. Fortunately, pretty much all this logic lives in one place, in xarray.core.duck_array_ops.
  3. We'll need to think a little bit about indexing in particular. Right now we have special indexing wrappers for NumPy arrays and Dask arrays; we would need to decide how to handle arbitrary array objects (probably by indexing them like NumPy arrays?). Basic indexing should work either way, but indexing with arrays can be a little tricky since few duck-array types support NumPy's full semantics (which are pretty complex).