pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.52k stars 1.05k forks source link

`isel(multi_index_level_name = MultiIndex.level)` corrupts the MultiIndex #8952

Open dcherian opened 3 months ago

dcherian commented 3 months ago

What happened?

From https://github.com/pydata/xarray/discussions/8951

if d is a MultiIndex-ed dataset with levels (x, y, z), and m is a dataset with a single coord x m.isel(x=d.x) builds a dataset with a MultiIndex with levels (y, z). This seems like it should work.

cc @benbovy

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import pandas as pd, xarray as xr, numpy as np

xr.set_options(use_flox=True)

test = pd.DataFrame()
test["x"] = np.arange(100) % 10
test["y"] = np.arange(100)
test["z"] = np.arange(100)
test["v"] = np.arange(100)

d = xr.Dataset.from_dataframe(test)
d = d.set_index(index = ["x", "y", "z"])
print(d)

m = d.groupby("x").mean()
print(m)

print(d.xindexes)
print(m.isel(x=d.x).xindexes)

xr.align(d, m.isel(x=d.x))
#res = d.groupby("x") - m
#print(res)
<xarray.Dataset>
Dimensions:  (index: 100)
Coordinates:
  * index    (index) object MultiIndex
  * x        (index) int64 0 1 2 3 4 5 6 7 8 9 0 1 2 ... 8 9 0 1 2 3 4 5 6 7 8 9
  * y        (index) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
  * z        (index) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
Data variables:
    v        (index) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
<xarray.Dataset>
Dimensions:  (x: 10)
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    v        (x) float64 45.0 46.0 47.0 48.0 49.0 50.0 51.0 52.0 53.0 54.0
Indexes:
  ┌ index    PandasMultiIndex
  │ x
  │ y
  └ z
Indexes:
  ┌ index    PandasMultiIndex
  │ y
  └ z
ValueError...

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

No response

Environment

benbovy commented 3 months ago

I think this occurs in the case of fancy indexing of an xarray object (i.e., provide another DataArray as indexer argument to isel) where the same coordinate name is found in both the indexed object and the indexer.

Remove the name conflict and it works fine, e.g.,

xr.align(d, m.rename(x="w").isel(w=d.x))

In such case, the coordinate in the indexer should probably be passed to the result instead of the one found in the indexed object (not the current behavior, although I haven't checked how the coordinates are merged in the result).