pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

AttributeError: module 'xarray' has no attribute 'open_rasterio' #8003

Closed tsnow03 closed 1 year ago

tsnow03 commented 1 year ago

What happened?

Hello! An old version of xarray worked fine with my code, but after an accidental update to 2023.05.0, creates this AttributeError with xarray open_rasterio. Downgrading to xarray 2023.03.0 fixed the issue, but all versions after had the bug including 2023.07.0. It seems related to corteva/rioxarray/issues/254. Thanks for all of your help!

What did you expect to happen?

No response

Minimal Complete Verifiable Example

# From JupyterHub normally with authentication via boto3
import pystac_client
import intake
import xarray as xr
import dask
import os
import rasterio as rio

# Define the landsat STAC catalog location
url = 'https://landsatlook.usgs.gov/stac-server'

api = pystac_client.Client.open(url)

items = api.search(
            bbox = (-103.0, -73.5, -102.0, -73.42),
            datetime = '2017-01-01/2017-01-31',
            collections='landsat-c2l1'
        ).item_collection()

# Open STAC catalog
catalog = intake.open_stac_item_collection(items)

img = list(catalog)[0]
item = catalog[img]

# Read in landsat band, specify chunk size
band = item['qa_pixel'](chunks=dict(band=1, x=512, y=512),
                    urlpath=item['qa_pixel'].metadata['alternate']['s3']['href']).to_dask()

MVCE confirmation

Relevant log output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [2], line 24
     20 item = catalog[img]
     22 # Read in landsat band, specify chunk size
     23 band = item['qa_pixel'](chunks=dict(band=1, x=512, y=512),
---> 24                     urlpath=item['qa_pixel'].metadata['alternate']['s3']['href']).to_dask()

File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_xarray/base.py:69, in DataSourceMixin.to_dask(self)
     67 def to_dask(self):
     68     """Return xarray object where variables are dask arrays"""
---> 69     return self.read_chunked()

File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_xarray/base.py:44, in DataSourceMixin.read_chunked(self)
     42 def read_chunked(self):
     43     """Return xarray object (which will have chunks)"""
---> 44     self._load_metadata()
     45     return self._ds

File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake/source/base.py:285, in DataSourceBase._load_metadata(self)
    283 """load metadata only if needed"""
    284 if self._schema is None:
--> 285     self._schema = self._get_schema()
    286     self.dtype = self._schema.dtype
    287     self.shape = self._schema.shape

File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_xarray/raster.py:102, in RasterIOSource._get_schema(self)
     99 self.urlpath, *_ = self._get_cache(self.urlpath)
    101 if self._ds is None:
--> 102     self._open_dataset()
    104     ds2 = xr.Dataset({'raster': self._ds})
    105     metadata = {
    106         'dims': dict(ds2.dims),
    107         'data_vars': {k: list(ds2[k].coords)
   (...)
    110         'array': 'raster'
    111     }

File /srv/conda/envs/notebook/lib/python3.10/site-packages/intake_xarray/raster.py:90, in RasterIOSource._open_dataset(self)
     88     self._ds = self._open_files(files)
     89 else:
---> 90     self._ds = xr.open_rasterio(files, chunks=self.chunks,
     91                                 **self._kwargs)

AttributeError: module 'xarray' has no attribute 'open_rasterio'

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:58:44) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.10.167-147.601.amzn2.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.0 libnetcdf: 4.9.2 xarray: 2023.5.0 pandas: 1.5.2 numpy: 1.23.5 scipy: 1.9.3 netCDF4: 1.6.4 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.15.0 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2022.11.0 distributed: 2022.11.0 matplotlib: 3.6.2 cartopy: 0.21.1 seaborn: 0.12.1 numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.7.2 pip: 22.3.1 conda: None pytest: 7.2.0 mypy: None IPython: 8.6.0 sphinx: 4.5.0
welcome[bot] commented 1 year ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

TomNicholas commented 1 year ago

Hi @tsnow03 , nice to see you! :wave:

Xarray's open_rasterio function was deprecated then removed completely in favour of using the backend entrypoint system (see https://github.com/pydata/xarray/issues/4697).

What this means for users is that they should install rioxarray and use xr.open_dataset(path, engine='rasterio'), see rioxarray docs on reading files. There should have been a warning raised about this automatically for the past 2 years.

In your case the offending open_rasterio call is happening inside intake_xarray. You should try updating that package, and if that doesn't work raise this issue again on the intake_xarray repository (exactly how you have done here, this is very clear, thank you).

However I'm surprised I can't find an issue about this there already - perhaps @scottyhq knows?

tsnow03 commented 1 year ago

Ok. Thank you @TomNicholas! I'll work on fixing this.

scottyhq commented 1 year ago

However I'm surprised I can't find an issue about this there already - perhaps @scottyhq knows?

intake-stac (which uses intake-xarray behind the scenes) development is somewhat stalled. The idea being that there are now tools to go directly from STAC catalogs to Xarray without needing intake as an intermediary.

Check out these alternatives: https://pypi.org/project/xpystac/ https://github.com/opendatacube/odc-stac https://github.com/gjoseph92/stackstac

@tsnow03 for a single asset like you've shown above here is an alternative:

# Requires local credentials for USGS Landsat on AWS
os.environ["AWS_REQUEST_PAYER"] = "requester"

asset = items[0].assets['qa_pixel']
href = asset.extra_fields['alternate']['s3']['href']

band = xr.open_dataset(href, engine='rasterio', chunks=dict(band=1, x=512, y=512))
band
tsnow03 commented 1 year ago

Thanks @scottyhq for the code example! You saved me a lot of time. This will get updated in our JupyterBook too.