pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.56k stars 1.07k forks source link

`DataArray.drop_sel()` along a MultiIndex-ed dimension drops too much data #6354

Open akukuq opened 2 years ago

akukuq commented 2 years ago

What happened?

Calling the drop_sel method on a DataArray object produces an incorrect result if the axis involved is indexed by a MultiIndex.

Assuming that dimension 'direction' is indexed by a MultiIndex with named levels 'azimuth' and 'elevation' one can attempt to drop a single item along this dimension in two ways:

What did you expect to happen?

I expected that at least the first call would result in a correctly dropped item along dimension 'direction'. This expectation is based on the fact that calling .sel(direction=(0.0, 0.0)) works and returns the element where both 'azimuth' and 'elevation' are 0.0.

It would be convenient for the second call to also work, but the analogous call to .sel(azimuth=0.0, elevation=0.0) does not work either, so I guess this is unsupported.

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr

# Create a DataArray with a MultiIndex-ed dimension:
da = xr.DataArray(
    np.reshape(np.arange(16), (4, 4)),
    dims = ['azimuth', 'elevation'],
    coords={
        'azimuth': range(4),
        'elevation': range(4),
    }).stack(direction=('azimuth', 'elevation'))

# Correctly selects a single item along 'direction'
da.sel(direction=(0, 0))

# Incorrectly removes all items along 'direction' where azimuth==0...
da.drop_sel(direction=(0, 0))

# ...but it should remove just one item, like this call:
da.drop_isel(direction=0)

# This fails with 'ValueError: dimension 'azimuth' does not have coordinate labels':
da.drop_sel(azimuth=0, elevation=0)

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.10.0-8-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_DK.UTF-8 LOCALE: ('en_DK', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.1

xarray: 0.20.1 pandas: 1.4.1 numpy: 1.21.2 scipy: 1.7.3 netCDF4: 1.5.7 pydap: None h5netcdf: 999 h5py: 2.10.0 Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: None distributed: None matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 58.0.4 pip: 21.2.4 conda: None pytest: None IPython: 8.1.1 sphinx: None

lukelbd commented 1 year ago

Just ran into this issue. It's less insidious if labels in each multi-index level are unique -- just raises an InvalidIndexError.

analkumar2 commented 2 months ago

I am also facing the same issue.