Closed b-pos465 closed 2 years ago
This does already work, but the invocation via intake-xarray (or xarray open_dataset directly) is complex. Actually, intake-xarray is great exactly because it hides this complexity from the user once you've figured it out. Your call should look something like
source = intake.open_zarr(
"reference://",
storage_options={
"fo": '/home/jovyan/work/output/s3/combine.json',
"remote_protocol": "...", # e.g., "s3", "http", ...
"remote_options": {...} # anything needed to configure that remote filesystem
},
consolidated=False
)
And yes, open_netcdf essentially does the same thing, except that you specify the engine, and all those arguments get nested inside a "backend_kwargs".
If you succeed in generating an interesting dataset and would like to share in public, the kerchunk project would like to know about it!
Thank you for your help! Your approach works perfectly fine.
I was able to generate a YAML file from the source above and load it back in.
Actually, I am not working on a dataset but on a web-based tool for migrating NETCDF4 data to Zarr. It supports both an actual conversion and the JSON metadata workflow mentioned above. Right now I am working on the Intake integration for the JSON metadata. Here is a link to the repository: https://github.com/climate-v/nc2zarr-webapp
Are you aware of https://pangeo-forge.org/
Use Case
I am trying to access NetCDF4 data via JSON metadata with intake-xarray. This approach is based on this blog post by lsterzinger. I am trying to make the data access as convenient as possible. The ideal solution for me with the existing API would look like this:
When testing this approach I get the following error:
The approach from the blog post uses an
FSMap
. So I tried the following:This one works. But it kind of misses the point of Intake as the user has to know about the
fsspec
API to create a workingFSMap
.Suggestion
Version 1
I would like to implement an extra case for the
open_zarr
method to support the JSON workflow introduced in the blog post mentioned above.Version 2
I could also imagine an extra method for the JSON workflow, something like
intake.open_zarr_metadata('combine.json')
.Questions
Which approach would you prefer?
While looking through existing issues I found #70. If I get it correctly, you removed the
fsspec
mapper 2020 as it wasn't needed anymore. Is there another solution to bring the JSON workflow to intake-xarray that I overlooked?Unfortunately, my Python knowledge is limited so I have no idea how to test a modified version of intake-array. I found https://intake-xarray.readthedocs.io/en/latest/contributing.html#id9 to run tests. But how can I test a modified version of intake-array with Intake locally? Would be great to have this in the docs!