pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Regression: "TypeError: Vectorized indexing is not supported" with xarray 2024.10.0 + sparse #9694

Open khaeru opened 3 weeks ago

khaeru commented 3 weeks ago

What happened?

Code that worked with xarray 2024.9.0 has begun to fail with xarray 2024.10.0, even though no breaking changes were advertised.

What did you expect to happen?

The below MCVE runs with xarray 2024.9.0, giving the output below.

I'd expect it runs the same way with xarray 2024.10.0.

Minimal Complete Verifiable Example

import pandas as pd
import xarray as xr
from numpy import nan

# Create a series
s = pd.Series(
    [nan, nan, 1.0, nan, nan, nan, 2, 3, 4, nan, 5, 6, 7, 8, 9],
    index=pd.MultiIndex.from_product(
        [["x0", "x1", "x2"], ["y0", "y1", "y2", "y3", "y4"]], names=list("xy")
    ),
)

# Create indexers
newdim = {"newdim": ["nd0", "nd1", "nd2"]}
x_idx = xr.DataArray(["x2", "x1", "x2"], coords=newdim)
y_idx = xr.DataArray(["y4", "y2", "y0"], coords=newdim)

for sparse in (False, True):
    # Create a DataArray
    da = xr.DataArray.from_series(s, sparse=sparse)

    # Do vectorized indexing
    print(da.sel(x=x_idx, y=y_idx))

MVCE confirmation

Relevant log output

With xarray 2024.9.0:

<xarray.DataArray (newdim: 3)> Size: 24B
array([9., 3., 5.])
Coordinates:
    x        (newdim) object 24B 'x2' 'x1' 'x2'
    y        (newdim) object 24B 'y4' 'y2' 'y0'
  * newdim   (newdim) <U3 36B 'nd0' 'nd1' 'nd2'
<xarray.DataArray (newdim: 3)> Size: 48B
<COO: shape=(3,), dtype=float64, nnz=3, fill_value=nan>
Coordinates:
    x        (newdim) object 24B 'x2' 'x1' 'x2'
    y        (newdim) object 24B 'y4' 'y2' 'y0'
  * newdim   (newdim) <U3 36B 'nd0' 'nd1' 'nd2'

With xarray 2024.10.0:

<xarray.DataArray (newdim: 3)> Size: 24B
array([9., 3., 5.])
Coordinates:
    x        (newdim) object 24B 'x2' 'x1' 'x2'
    y        (newdim) object 24B 'y4' 'y2' 'y0'
  * newdim   (newdim) <U3 36B 'nd0' 'nd1' 'nd2'
Traceback (most recent call last):
  File "/home/khaeru/vc/genno/bug.py", line 23, in <module>
    print(da.sel(x=x_idx, y=y_idx))
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1675, in sel
    ds = self._to_temp_dataset().sel(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/dataset.py", line 3237, in sel
    result = self.isel(indexers=query_results.dim_indexers, drop=drop)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/dataset.py", line 3070, in isel
    return self._isel_fancy(indexers, drop=drop, missing_dims=missing_dims)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/dataset.py", line 3126, in _isel_fancy
    new_var = var.isel(indexers=var_indexers)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/variable.py", line 1049, in isel
    return self[key]
           ~~~~^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/variable.py", line 816, in __getitem__
    data = indexing.apply_indexer(indexable, indexer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/indexing.py", line 1029, in apply_indexer
    return indexable.vindex[indexer]
           ~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/indexing.py", line 369, in __getitem__
    return self.getter(key)
           ^^^^^^^^^^^^^^^^
  File "/home/khaeru/.venv/3.12/lib/python3.12/site-packages/xarray/core/indexing.py", line 1589, in _vindex_get
    raise TypeError("Vectorized indexing is not supported")
TypeError: Vectorized indexing is not supported

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.7 (main, Oct 3 2024, 15:15:22) [GCC 14.2.0] python-bits: 64 OS: Linux OS-release: 6.11.0-9-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.10.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.1 netCDF4: None pydap: None h5netcdf: None h5py: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: 1.4.0 dask: 2024.5.1 distributed: None matplotlib: 3.9.0 cartopy: None seaborn: 0.13.1 numbagg: None fsspec: 2023.12.2 cupy: None pint: 0.24.1 sparse: 0.15.4 flox: None numpy_groupies: None setuptools: 69.0.3 pip: 24.2 conda: None pytest: 8.3.3 mypy: 1.11.2 IPython: 8.17.1 sphinx: 8.1.3
keewis commented 3 weeks ago

thanks for the report!

This was introduced in #9530, and it was definitely not intentional to break vectorized indexing with sparse (looks like I forgot to add a release note entry, which I would have put into "new features").

The issue here is the order of preference for the indexing adapters, which was switched from preferring __array_function__ to preferring __array_namespace__ if both are present. However, it looks like I missed sparse when investigating whether any array type in use already implemented both (and our tests didn't catch them either).

There's two things I believe we can do (besides reverting if we need more time to figure out a good way to resolve this):

  1. allow specifying the preferred protocol by setting an attribute on the array type (with the default being "__array_function__")
  2. work around the Array API not including vectorized indexing / indexing with an integer array

not sure which one will be better in the end

dcherian commented 3 weeks ago

work around the Array API not including vectorized indexing / indexing with an integer array

Let's move ahead with this. We already have the code in explicit_indexing_adapter, we just need to figure out the right IndexingSupport enum variant for whatever array api prescribes.

shoyer commented 3 weeks ago

work around the Array API not including vectorized indexing / indexing with an integer array

Let's move ahead with this. We already have the code in explicit_indexing_adapter, we just need to figure out the right IndexingSupport enum variant for whatever array api prescribes.

The array API has discussed adding support for vectorized indexing in the near future: https://github.com/data-apis/array-api/issues/669

Hopefully this happens!