Open NathanCummings opened 2 weeks ago
Intake doesn't seem to be clever enough to guess that the URL 'https://s3.echo.stfc.ac.uk/mast/level1/shots/30420.zarr/amc' is zarr, even though the xarray and engine
context make this clear. Currently, that's the pattern we follow: guess the filetype, and then see if the function called is one of the readers that can act on that type; and this only works for file types.
In this case, the function to call is clear, and we know which readers can produce that kind of data
[_ for _ in intake.readers.utils.subclasses(intake.BaseReader) if "xarray:Dataset" == _.output_instance]
or use that exact function
[_ for _ in intake.readers.utils.subclasses(intake.BaseReader) if "xarray:open_dataset" == _.func or "xarray:open_dataset" in _.other_funcs]
so it really should be possible to guess this case too.
Of course, you can still construct the reader explicitly:
intake.readers.readers.XArrayDatasetReader(intake.datatypes.Zarr("https://s3.echo.stfc.ac.uk/mast/level1/shots/30420.zarr/amc"), engine="zarr")
Note to self: this should still not be an exception, though; either the recommender should only test for file-like types, or it should not pass storage_options when it's not appropriate.
Cool, thank you.
Using:
intake.readers.readers.XArrayDatasetReader(intake.datatypes.Zarr("https://s3.echo.stfc.ac.uk/mast/level1/shots/30420.zarr/amc"), engine="zarr")
worked.
As an extra tip, it took me a beat to realise that I needed to add chunks="auto"
to make xarray use Dask arrays for the variables, so:
reader = intake.readers.readers.XArrayDatasetReader(
intake.datatypes.Zarr(
"https://s3.echo.stfc.ac.uk/mast/level1/shots/30420.zarr/amc"
),
engine="zarr",
chunks="auto", # need this so xarray will load the variables as dask arrays
)
does what I want.
I surprise that "auto" is not the default, maybe. Intake is, of course, mostly passing through arguments to the actual library doing the reading.
I'm trying to define a catalog with an Xarray reader for my Zarr files using intake v2. Looking at the available readers, I think the following should work, but I am getting the exception below.
This is a public bucket, and the data are licensed under CC-BY-SA, so this url is fine to use for testing.
I was working through the debugger, following
readers.reader_from_call()
and intodatatypes.recommend()
, but I couldn't follow well enough to be sure where things are going wrong.