intake / intake-xarray

Intake plugin for xarray
https://intake-xarray.readthedocs.io/
BSD 2-Clause "Simplified" License
76 stars 36 forks source link

Error reading zarr store from an IPFS gateway #130

Open lgloege opened 2 years ago

lgloege commented 2 years ago

Thank you for building intake-xarray, this is an awesome package! I am having trouble reading a zarr datastore via an IPFS gateway. I am trying to read a NOAA SST dataset with an intake catalog.

This code reads a NOAA SST dataset with just xarray

import xarray as xr
zarr_store = "https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/"
ds = xr.open_dataset(zarr_store, engine="zarr")

This code works as expected. Now I want to build an intake catalog to read this file. I wrote the following simple catalog catalog_ipfs.yaml

---
plugins:
  source:
    - module: intake_xarray

sources:
  SST:      
    args:
      urlpath: https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/
      xarray_kwargs: 
          engine: zarr
    driver: intake_xarray.netcdf.NetCDFSource

I then try reading data from it using the following code:

import intake
import xarray as xr
catolog = './catalog_ipfs.yaml'
cat = intake.open_catalog(catolog)
ds = cat.SST().read()

When I run this code I get this ValueError

ValueError: Starting with Zarr 2.11.0, stores must be subclasses of BaseStore, if your store exposes the MutableMapping interface wrap it in Zarr.storage.KVStore. Got <File-like object HTTPFileSystem, https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/>

Any thoughts on how to resolve this issue? I am confused because I thought in the background intake_xarray was just doing this xr.open_dataset("https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/", engine="zarr"), which I know works from the above example.

Here is the version of each package I am using:

intake.__version__ = '0.6.6'
intake_xarray.__version__ = '0.6.1'
xr.__version__ = '2022.6.0'
zarr.__version__ = '2.12.0'

I appreciate any help you can provide, thanks!

observingClouds commented 2 years ago

Hi @lgloege I'm excited to see that more and more data is hosted on IPFS. Have you tried to access this data directly via the IPFS protocol and not via http? Here is an example on how the syntax looks like. This way you would also be independent of the gateway. You will need to install ipfsspec though.

observingClouds commented 2 years ago

driver: intake_xarray.netcdf.NetCDFSource looks also suspicious. I would have rather written

sources:
  SST:      
    args:
      urlpath: https://dweb.link/ipfs/bafybeiepna7ilkhdwykd65i7aovmrapnmaifullesxg5pdsjy6cos5qfrm/
    driver: zarr