xarray-contrib / cf-xarray

an accessor for xarray objects that interprets CF attributes
https://cf-xarray.readthedocs.io/
Apache License 2.0
157 stars 39 forks source link

Decoder for MultiIndexes fails if there are other variables, using a dimension which is part of the multiindex #461

Open okz opened 1 year ago

okz commented 1 year ago

First, thank you so much. Compression-by-gathering is an incredibly usefull addition, which hopefully will end up in xarray for ragged (or sparse) array support on netcdf's. one day.

321 added support encoding and decoding for Pandas multi-indexes using "compression by gathering". However if there are other variables in the dataset using a dimension which is part of the multiindex, decode fails.

Minimum example, is a single line addition of var_with_lat , derived from the Encoding and decoding tutorial:

ds = xr.Dataset(
    {"landsoilt": ("landpoint", np.random.randn(4), {"foo": "bar"})},
    {
        "landpoint": pd.MultiIndex.from_product(
            [["a", "b"], [1, 2]], names=("lat", "lon")
        )
    },
)

# ADDING THIS LINE WILL FAIL THE DECODING PROCESS. 
# ds["var_with_lat"] = xr.DataArray([1,2], dims="lat")

encoded = cfxr.encode_multi_index_as_compress(ds, "landpoint")
decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")

Once var_with_lat is added, decoding fails:

---> [129](file:///home/mirico/git/Curvefit/tests/scratch%20copy.py?line=128) decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")

File [~/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py:116](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2232302e37372e32382e323139222c2275736572223a226d697269636f227d.vscode-resource.vscode-cdn.net/home/mirico/git/~/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py:116), in decode_compress_to_multi_index(encoded, idxnames)
    [110](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=109)     from xarray.indexes import PandasMultiIndex
    [112](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=111)     variables = {
    [113](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=112)         dim: encoded[dim].isel({dim: xr.Variable(data=index, dims=idxname)})
    [114](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=113)         for dim, index in zip(names, indices)
    [115](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=114)     }
--> [116](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=115)     decoded = decoded.assign_coords(variables).set_xindex(
    [117](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=116)         names, PandasMultiIndex
    [118](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=117)     )
    [119](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=118) except ImportError:
    [120](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=119)     arrays = [encoded[dim].data[index] for dim, index in zip(names, indices)]

File [~/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py:4330](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2232302e37372e32382e323139222c2275736572223a226d697269636f227d.vscode-resource.vscode-cdn.net/home/mirico/git/~/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py:4330), in Dataset.set_xindex(self, coord_names, index_cls, **options)
   [4327](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4326) indexed_coords = set(coord_names) & set(self._indexes)
   [4329](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4328) if indexed_coords:
-> [4330](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4329)     raise ValueError(
   [4331](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4330)         f"those coordinates already have an index: {indexed_coords}"
   [4332](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4331)     )
   [4334](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4333) coord_vars = {name: self._variables[name] for name in coord_names}
   [4336](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4335) index = index_cls.from_variables(coord_vars, options=options)

ValueError: those coordinates already have an index: {'lat'}