I tried using the new multi-dimensional grouping added in #9372, with one BinGrouper per dimension. I'm using version 2024.09.0. If I construct the BinGrouper such that some bins end up empty, I get an IndexError:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[9], line 1
----> 1 ds.groupby(x=BinGrouper(np.arange(0,13,4)), y=BinGrouper(bins=np.arange(0,16,2)))
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py:118, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
114 kwargs.update({name: arg for name, arg in zip_args})
116 return func(*args[:-n_extra_args], **kwargs)
--> 118 return func(*args, **kwargs)
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/xarray/core/dataset.py:10444, in Dataset.groupby(self, group, squeeze, restore_coord_dims, **groupers)
10441 _validate_groupby_squeeze(squeeze)
10442 rgroupers = _parse_group_and_groupers(self, group, groupers)
> 10444 return DatasetGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/xarray/core/groupby.py:581, in GroupBy.__init__(self, obj, groupers, restore_coord_dims)
573 if any(
574 isinstance(obj._indexes.get(grouper.name, None), PandasMultiIndex)
575 for grouper in groupers
576 ):
577 raise NotImplementedError(
578 "Grouping by multiple variables, one of which "
579 "wraps a Pandas MultiIndex, is not supported yet."
580 )
--> 581 self.encoded = ComposedGrouper(groupers).factorize()
583 # specification for the groupby operation
584 # TODO: handle obj having variables that are not present on any of the groupers
585 # simple broadcasting fails for ExtensionArrays.
586 (self.group1d, self._obj, self._stacked_dim, self._inserted_dims) = _ensure_1d(
587 group=self.encoded.codes, obj=obj
588 )
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/xarray/core/groupby.py:470, in ComposedGrouper.factorize(self)
464 midx = pd.MultiIndex.from_product(
465 (grouper.unique_coord.data for grouper in groupers),
466 names=tuple(grouper.name for grouper in groupers),
467 )
468 # Constructing an index from the product is wrong when there are missing groups
469 # (e.g. binning, resampling). Account for that now.
--> 470 midx = midx[np.sort(pd.unique(_flatcodes[~mask]))]
472 full_index = pd.MultiIndex.from_product(
473 (grouper.full_index.values for grouper in groupers),
474 names=tuple(grouper.name for grouper in groupers),
475 )
476 dim_name = "stacked_" + "_".join(str(grouper.name) for grouper in groupers)
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/pandas/core/indexes/multi.py:2207, in MultiIndex.__getitem__(self, key)
2204 elif isinstance(key, Index):
2205 key = np.asarray(key)
-> 2207 new_codes = [level_codes[key] for level_codes in self.codes]
2209 return MultiIndex(
2210 levels=self.levels,
2211 codes=new_codes,
(...)
2214 verify_integrity=False,
2215 )
IndexError: index 18 is out of bounds for axis 0 with size 18
What did you expect to happen?
It should work, even if some bins are empty, just like it works correctly for a single dimension.
Minimal Complete Verifiable Example
In [1]: ds = xr.Dataset(
...: {"foo": (("z"), np.random.random_sample(12))},
...: coords={"x": ("z", np.arange(12)), "y": ("z", np.arange(12))},
...: )
In [2]: from xarray.groupers import BinGrouper
In [3]: ds.groupby(x=BinGrouper(np.arange(0,13,4)), y=BinGrouper(bins=np.arange(0,16,2)))
MVCE confirmation
[X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
[X] Complete example — the example is self-contained, including all data and the text of any traceback.
[X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
[X] New issue — a search of GitHub Issues suggests this is not a duplicate.
[X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
If we make sure that no bins are empty, it works, e.g.
What happened?
I tried using the new multi-dimensional grouping added in #9372, with one
BinGrouper
per dimension. I'm using version 2024.09.0. If I construct theBinGrouper
such that some bins end up empty, I get anIndexError
:What did you expect to happen?
It should work, even if some bins are empty, just like it works correctly for a single dimension.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
If we make sure that no bins are empty, it works, e.g.
Also, if we give the same bins as above, but only for a single dimension, it also works:
Environment