pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

No attribute `map_over_subtree` #9710

Closed melonora closed 2 weeks ago

melonora commented 2 weeks ago

What happened?

Looking for a way to map a function over Datasets in a DataTree I was hit by the issue described in #9693. This because of the node with path . not containing the dimensions I was trying to transpose.

Traceback (most recent call last):
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-38-46c5b7c11604>", line 1, in <module>
    tree = tree.map_over_datasets(Dataset.transpose, ('y', 'x', 'c'))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\datatree.py", line 1462, in map_over_datasets
    return map_over_datasets(func, self, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\datatree_mapping.py", line 103, in map_over_datasets
    results = func_with_error_context(*node_dataset_args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\datatree_mapping.py", line 133, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\util\deprecation_helpers.py", line 143, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\dataset.py", line 6415, in transpose
    _ = list(infix_dims(dim, self.dims, missing_dims))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\namedarray\utils.py", line 171, in infix_dims
    existing_dims = drop_missing_dims(dims_supplied, dims_all, missing_dims)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\namedarray\utils.py", line 124, in drop_missing_dims
    raise ValueError(
ValueError: Dimensions {('y', 'x', 'c')} do not exist. Expected one or more of FrozenMappingWarningOnValuesAccess({})
Raised whilst mapping function over node with path '.'

Trying to find a workaround with map_over_subtree did not work either as seemingly in the latest xarray (2024.10.0) this does not exist. I get an AttributeError, while according to the documentation the method does exist.

What did you expect to happen?

I expect as output a datatree in which the datasets have their dimensions transposed.

Minimal Complete Verifiable Example

import numpy as np
from dask.array.core import from_array
from xarray import DataTree, DataArray, Dataset

img = from_array(np.random.rand(3, 512,512))
dims = ['c','y','x']
scale_factors = [2,2]
data = DataArray(img, coords={dims[dim_index]: range(img.shape[dim_index]) for dim_index in range(len(dims))} ,dims=('c','y','x'), name="image")

multiscale_data = {
        "scale0": data.to_dataset(name=data.name, promote_attrs=True)
    }

for factor_index, scale_factor in enumerate(scale_factors):
    dim_factors = {'y': scale_factor, 'x': scale_factor}
    downscaled = data.coarsen(dim=dim_factors, boundary="trim", side="right").mean().astype(data.dtype)
    multiscale_data[f"scale{factor_index+1}"] = downscaled.to_dataset(name=data.name, promote_attrs=True)

multiscale_image = DataTree.from_dict(multiscale_data)

# Following leads to error as node with path '.' has no dimensions
multiscale_image = multiscale_image.map_over_datasets(Dataset.transpose, ('y', 'x', 'c'))
# Following leads to error as map_over_subtree does not exist.
multiscale_image = multiscale_image.map_over_subtree(Dataset.transpose, ('y', 'x', 'c'))

MVCE confirmation

Relevant log output

type(element)
Out[31]: xarray.core.datatree.DataTree
element.map_over_subtree(Dataset.transpose, "y","x","c")
Traceback (most recent call last):
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-32-43b28a4a5288>", line 1, in <module>
    element.map_over_subtree(Dataset.transpose, "y","x","c")
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\xarray\core\common.py", line 302, in __getattr__
    raise AttributeError(
AttributeError: 'DataTree' object has no attribute 'map_over_subtree'

Anything else we need to know?

No response

Environment

C:\ProgramData\miniforge3\envs\xarray_datatree\Lib\site-packages\_distutils_hack\__init__.py:31: UserWarning: Setuptools is replacing distutils. Support for replaci ng an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn( INSTALLED VERSIONS ------------------ commit: None python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:17:14) [MSC v.1941 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 25 Model 97 Stepping 2, AuthenticAMD byteorder: little LC_ALL: None LANG: None LOCALE: ('English_Netherlands', '1252') libhdf5: 1.14.2 libnetcdf: None xarray: 2024.10.0 pandas: 2.2.3 numpy: 1.26.4 scipy: 1.14.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.12.1 zarr: 2.18.3 cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2024.6.2 distributed: 2024.6.2 matplotlib: 3.9.2 cartopy: None seaborn: 0.13.2 numbagg: None fsspec: 2023.6.0 cupy: None pint: 0.24.3 sparse: None flox: None numpy_groupies: None setuptools: 75.3.0 pip: 24.3.1 conda: None pytest: 8.3.3 mypy: 1.13.0 IPython: 8.29.0 sphinx: 8.1.3
kmuehlbauer commented 2 weeks ago

@melonora map_over_subtree was removed from API in the process of moving datatree into xarray codebase.

Please use map_over_datasets with one of the workarounds as suggested in #9693 for the time being.

keewis commented 2 weeks ago

additionally, ds.transpose(("x", "y", "z")) will not work unless you have a dimension named ("x", "y", "z") (i.e. the dimension name is a tuple), since Dataset.transpose takes the dimension names as *args).

Given that map_over_subtree intentionally does not exist anymore, I think this is a duplicate of #9693.

Edit: or rather, where in the documentation did you find DataTree.map_over_subtree? If that really still exists I'd call that a documentation bug.

kmuehlbauer commented 2 weeks ago

@melonora @keewis There is no mention of map_over_subtree in the latest stable docs. So maybe the used doc was outdated?

@melonora To get you working until #9693 is sorted out, here is a workaround (please also take @keewis comment on transpose arguments into account):

import functools
def skip_nodes(func):
    @functools.wraps(func)
    def _func(ds, *args, **kwargs):
        # check if needed dimensions are available in the Dataset
        # otherwise return verbatim
        if not all(arg in ds.dims for arg in args):
            return ds
        return func(ds, *args, **kwargs)
    return _func

@skip_nodes
def transpose(ds, *args, **kwargs):
    return ds.transpose(*args, **kwargs)

multiscale_image = multiscale_image.map_over_datasets(transpose, 'y', 'x', 'c')
kmuehlbauer commented 2 weeks ago

I'll close this as dupe of #9693.

melonora commented 2 weeks ago

additionally, ds.transpose(("x", "y", "z")) will not work unless you have a dimension named ("x", "y", "z") (i.e. the dimension name is a tuple), since Dataset.transpose takes the dimension names as *args).

Given that map_over_subtree intentionally does not exist anymore, I think this is a duplicate of #9693.

Edit: or rather, where in the documentation did you find DataTree.map_over_subtree? If that really still exists I'd call that a documentation bug.

ah sorry was looking at the xarray_datatree documentation

melonora commented 2 weeks ago

@melonora @keewis There is no mention of map_over_subtree in the latest stable docs. So maybe the used doc was outdated?

@melonora To get you working until #9693 is sorted out, here is a workaround (please also take @keewis comment on transpose arguments into account):

import functools
def skip_nodes(func):
    @functools.wraps(func)
    def _func(ds, *args, **kwargs):
        # check if needed dimensions are available in the Dataset
        # otherwise return verbatim
        if not all(arg in ds.dims for arg in args):
            return ds
        return func(ds, *args, **kwargs)
    return _func

@skip_nodes
def transpose(ds, *args, **kwargs):
    return ds.transpose(*args, **kwargs)

multiscale_image = multiscale_image.map_over_datasets(transpose, 'y', 'x', 'c')

Thanks! I had a similar workaround for now

eschalkargans commented 2 weeks ago

Hello,

I am currently migrating to 2024.10.0. I encountered some code making use of the former map_over_subtree decorator.

What is the suggested migration process to migrate such code to the map_over_subsets one? Is the decorator aspect of it definitely gone?

Thanks for your answer

TomNicholas commented 1 day ago

What is the suggested migration process to migrate such code to the map_over_subsets one?

Sorry apparently I forgot to add this to the migration guide (I've added it in #9804).

Is the decorator aspect of it definitely gone?

Yes, we decided that it was better to have it be consistent with xr.apply_ufunc. If you want decorator-like behaviour you could use functools.partial or just wrap the .map_over_datasets call in a new function.