pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.61k stars 1.08k forks source link

Can't open datasets with the `rasterio` engine. #7831

Open simonrp84 opened 1 year ago

simonrp84 commented 1 year ago

What happened?

Hello, When using this command: data = xr.open_dataset(my_filename, engine="rasterio")

I get an error: ValueError: unrecognized engine rasterio must be one of: ['netcdf4', 'scipy', 'store', 'zarr']

This error is generated because I don't have rioxarray installed. However, that's not clear from the message and the user is likely to assume that it's because they don't have rasterio installed. Would it be possible to improve this error message to allow the user to see that they require rioxarray?

What did you expect to happen?

An error message to be displayed that helps the user understand which package is missing. Something like:

ValueError: unrecognized engine rasterio must be one of: [engines]. The rasterio engine requires rioxarray to be installed.

Minimal Complete Verifiable Example

To make a new conda env:

conda create --name xrtesting
conda activate xrtesting
conda install xarray rasterio

Then, to generate the error:

import xarray as xr
my_filename = 'test.tif' # This triggers the error even if the file is not present
data = xr.open_dataset(my_filename, engine="rasterio")


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.11.3 | packaged by conda-forge | (main, Apr  6 2023, 08:57:19) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-162-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2023.4.2
pandas: 2.0.1
numpy: 1.24.3
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

</details>
welcome[bot] commented 1 year ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

dcherian commented 1 year ago

I think this would be nice since we recently removed the rasterio backend.

headtr1ck commented 1 year ago

I don't know how we would implement that, it's probably not a good idea to special case all external backends within xarray.

Either the package is installed and then it works or it is not installed and then we don't know which backend/package is missing.

dcherian commented 1 year ago

I was suggesting to special-case rioxarray only just because we recently deleted the rasterio backend, and that might ease the transition. Can we do it at the top-level open-dataset when engine=="rasterio" but rioxarray is not importable?

kmuehlbauer commented 1 year ago

Maybe it would also help to rephrase the error, something along the lines

"Engine rasterio is not available. Please install the needed package. Engines [xxx, yyy, zzz] are available."

kmuehlbauer commented 1 year ago

Yet another idea would be to add and Engines heading on https://docs.xarray.dev/en/stable/ecosystem.html where engines/backends and there respective packages can be listed. The error could include a link to that page.

simonrp84 commented 1 year ago

Thanks for the replies. Yes, that second suggestion sounds good @kmuehlbauer!

I realise it's not practical to add specific checks / messages for all engines, so something like this that links to a webpage that describes potential solutions seems like an excellent compromise. Your earlier solution (rephasing the error) I think would not help, however, as it still doesn't show users what the actual missing package is rioxarray vs rasterio.