pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

ENH: Throw informative error if missing optional dependency #9554

Open scott-huberty opened 1 month ago

scott-huberty commented 1 month ago

This ticket proposes two separate but overlapping ideas.

1. When trying to write/read Zarr files, throw an informative error if the user does not have zarr installed

just as Xarray does for other optional dependencies like Matplotlib.

Reproducible Code Assuming you only have the base Xarray deps installed: ```python import xarray as xr da = xr.DataArray([1, 2, 3], dims='x', coords={'x': [10, 20, 30]} ) da.plot() # Raises a more informative ImportError da.to_zarr('./test.zarr') # Does not raise an informative ImportError ```

2. Create a utility function to handle optional dependency imports across the Xarray code base.

Doing git grep "raise ImportError" or git grep "raise ModuleNotFoundError" reveals that currently there is a piecemeal approach to handling optional dependencies. It would probably be helpful to have a dedicated function to handle soft dependencies across the code base, and I would be happy to add this If I am going to be adding a PR for point 1 anyways.

Describe the solution you'd like

I work on an applied neuroscience package named MNE-Python, where we have a lot of optional dependencies that are only needed for specific MNE functionalities. So borrowing from MNE, I think something like this would be nice:

# xarray.util.check.py

import importlib

def _soft_import(name, reason, strict=True):
    """Import optional dependencies, providing informative errors on failure."""
    _INSTALL_MAPPING = dict(nc_time_axis="nc-time-axis")
    package_name = _INSTALL_MAPPING.get(name, name)
    try:
        return importlib.import_module(name)
    except (ImportError, ModuleNotFoundError):
        if strict:
            raise ImportError(
                f"for {reason}, the {package_name} package is required. "
                f"Please install it via pip or conda."
            )
        else:
            return None

Then, for example, here, we can instead do:


zarr = _soft_import("zarr", reason="writing zarr files")

And if we don't have zarr installed we'd get:


ImportError: for writing zarr files, the zarr package is required. Please install it via pip or conda.

Describe alternatives you've considered

Open to any ideas 🙂

Additional context

What do folks think about this idea? Is a PR welcome?

welcome[bot] commented 1 month ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

max-sixty commented 1 month ago

Yes this sounds like a really nice contribution!

kmuehlbauer commented 1 month ago

I'm totally supporting this idea.

Just to get this into the discussion: in one of the packages I maintain this was extended to allow such scheme for global imports:

https://github.com/wradlib/wradlib/blob/c27369667fbbe771fd0af8db7f4b0cc0066e61c4/wradlib/util.py#L50-L128

kmuehlbauer commented 1 month ago

Found this while researching other options

https://scientific-python.org/specs/spec-0001/

scott-huberty commented 1 month ago

I think that lazy loading and optional dependency handling are separate issues, at least as far as how it is implemented in lazy_loader. For example with lazy_loader I think that top level imports of optional dependencies will still raise an error (if that package is not installed).

For what it's worth MNE-Python is trying out lazy_loader, and my current understanding is that the development team has not reached consensus on it ( see https://github.com/mne-tools/mne-python/pull/12059 )