pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

DataTree.isel called with missing_dims fails when a subtree is missing the dimension #9717

Closed sjperkins closed 2 weeks ago

sjperkins commented 2 weeks ago

What happened?

Previously in 2024.09.0, subtrees appeared to inherit all parent coordinates, so the DataTree.isel statement in the MCVE worked. However in 2024.10.0, this inheritance no longer seems to apply, so a DataTree.isel fails when applying selection logic on the subtree with a parent dimension. Furthermore DataTree.isel(..., missing_dims="ignore") also fails.

What did you expect to happen?

I'm not sure if the 2024.09.0 coordinate inheritance should still apply if the inherited coordinates aren't present on the subtree.

However, DataTree.isel(..., missing_dims="ignore") should work

Likely this is failing due to missing_dims not being applied in apply_indxers here:

https://github.com/pydata/xarray/blob/038436365d4757a322cac37307503c132d1fe2a7/xarray/core/datatree.py#L1831-L1836

Minimal Complete Verifiable Example

>>> import xarray
>>> import numpy as np

>>> ds = xarray.Dataset({"time": ("x", np.ones(1000)), "data": (("x", "y", "z"), np.ones((1000, 64, 4)))})
>>> subds = xarray.Dataset({"pos": (("a", "b"), np.ones((32, 3)))})
>>> dt = xarray.DataTree.from_dict({"A": ds, "A/S": subds})
>>> dt
<xarray.DataTree>
Group: /
└── Group: /A
    │   Dimensions:  (x: 1000, y: 64, z: 4)
    │   Dimensions without coordinates: x, y, z
    │   Data variables:
    │       time     (x) float64 8kB 1.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0
    │       data     (x, y, z) float64 2MB 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    └── Group: /A/S
            Dimensions:  (a: 32, b: 3)
            Dimensions without coordinates: a, b
            Data variables:
                pos      (a, b) float64 768B 1.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0

>>> dt.isel(x=slice(10, 20))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 1
----> 1 dt.isel(x=slice(10, 20))

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/datatree.py:1821, in DataTree.isel(self, indexers, drop, missing_dims, **indexers_kwargs)
   1818     return dataset.isel(node_indexers, drop=drop)
   1820 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel")
-> 1821 return self._selective_indexing(
   1822     apply_indexers, indexers, missing_dims=missing_dims
   1823 )

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/datatree.py:1745, in DataTree._selective_indexing(self, func, indexers, missing_dims)
   1743 for path, node in self.subtree_with_keys:
   1744     node_indexers = {k: v for k, v in indexers.items() if k in node.dims}
-> 1745     node_result = func(node.dataset, node_indexers)
   1746     # Indexing datasets corresponding to each node results in redundant
   1747     # coordinates when indexes from a parent node are inherited.
   1748     # Ideally, we would avoid creating such coordinates in the first
   1749     # place, but that would require implementing indexing operations at
   1750     # the Variable instead of the Dataset level.
   1751     if node is not self:

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/datatree.py:1818, in DataTree.isel.<locals>.apply_indexers(dataset, node_indexers)
   1817 def apply_indexers(dataset, node_indexers):
-> 1818     return dataset.isel(node_indexers, drop=drop)

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/dataset.py:3074, in Dataset.isel(self, indexers, drop, missing_dims, **indexers_kwargs)
   3070     return self._isel_fancy(indexers, drop=drop, missing_dims=missing_dims)
   3072 # Much faster algorithm for when all indexers are ints, slices, one-dimensional
   3073 # lists, or zero or one-dimensional np.ndarray's
-> 3074 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
   3076 variables = {}
   3077 dims: dict[Hashable, int] = {}

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/utils.py:804, in drop_dims_from_indexers(indexers, dims, missing_dims)
    802     invalid = indexers.keys() - set(dims)
    803     if invalid:
--> 804         raise ValueError(
    805             f"Dimensions {invalid} do not exist. Expected one or more of {dims}"
    806         )
    808     return indexers
    810 elif missing_dims == "warn":
    811     # don't modify input

ValueError: Dimensions {'x'} do not exist. Expected one or more of FrozenMappingWarningOnValuesAccess({'a': 32, 'b': 3})

>>> dt.isel(x=slice(10, 20), missing_dims="ignore")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 dt.isel(x=slice(10, 20), missing_dims="ignore")

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/datatree.py:1821, in DataTree.isel(self, indexers, drop, missing_dims, **indexers_kwargs)
   1818     return dataset.isel(node_indexers, drop=drop)
   1820 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel")
-> 1821 return self._selective_indexing(
   1822     apply_indexers, indexers, missing_dims=missing_dims
   1823 )

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/datatree.py:1745, in DataTree._selective_indexing(self, func, indexers, missing_dims)
   1743 for path, node in self.subtree_with_keys:
   1744     node_indexers = {k: v for k, v in indexers.items() if k in node.dims}
-> 1745     node_result = func(node.dataset, node_indexers)
   1746     # Indexing datasets corresponding to each node results in redundant
   1747     # coordinates when indexes from a parent node are inherited.
   1748     # Ideally, we would avoid creating such coordinates in the first
   1749     # place, but that would require implementing indexing operations at
   1750     # the Variable instead of the Dataset level.
   1751     if node is not self:

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/datatree.py:1818, in DataTree.isel.<locals>.apply_indexers(dataset, node_indexers)
   1817 def apply_indexers(dataset, node_indexers):
-> 1818     return dataset.isel(node_indexers, drop=drop)

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/dataset.py:3074, in Dataset.isel(self, indexers, drop, missing_dims, **indexers_kwargs)
   3070     return self._isel_fancy(indexers, drop=drop, missing_dims=missing_dims)
   3072 # Much faster algorithm for when all indexers are ints, slices, one-dimensional
   3073 # lists, or zero or one-dimensional np.ndarray's
-> 3074 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
   3076 variables = {}
   3077 dims: dict[Hashable, int] = {}

File ~/.cache/pypoetry/virtualenvs/xarray-ms-jDhc3Ane-py3.11/lib/python3.11/site-packages/xarray/core/utils.py:804, in drop_dims_from_indexers(indexers, dims, missing_dims)
    802     invalid = indexers.keys() - set(dims)
    803     if invalid:
--> 804         raise ValueError(
    805             f"Dimensions {invalid} do not exist. Expected one or more of {dims}"
    806         )
    808     return indexers
    810 elif missing_dims == "warn":
    811     # don't modify input

ValueError: Dimensions {'x'} do not exist. Expected one or more of FrozenMappingWarningOnValuesAccess({'a': 32, 'b': 3})

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

Environment

INSTALLED VERSIONS ------------------------------- commit: None python: 3.11.10 (main, Sep 7 2024, 18:35:41) [GCC 13.2.0] python-bits: 64 OS: Linux OS-release: 6.8.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_ZA.UTF-8 LOCALE: ('en_ZA', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.10.0 pandas: 2.2.3 numpy: 2.1.3 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None zarr: 2.18.3 cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2024.10.0 distributed: 2024.10.0 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2024.10.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 74.1.2 pip: 24.2 conda: None pytest: 8.3.3 mypy: None IPython: 8.29.0 sphinx: 8.1.3
sjperkins commented 2 weeks ago

I think I've confused myself converting my original issue into an MCVE. Closing for now.