Closed norlandrhagen closed 1 month ago
I added high priority label, since I think this will be quite important to support upcoming virtualized datasets like https://github.com/leap-stc/data-management/issues/118 and others.
from either the meta.yaml or the catalog.yaml.
Seems to me this is entirely a catalog 'feature', so I would vote for catalog.yaml
ds = xr.open_dataset(
, engine="kerchunk", chunks={})
@norlandrhagen, i'm curious... is (engine='kerchunk')
all you need to load reference file on xarray dataset? i'm trying to figure out what other changes i need to make in https://github.com/carbonplan/html-reprs/blob/094a7992cba029ce284f031623872e624bfafc48/src/app.py#L96
Yup! I think if Kerchunk is installed in the env, it should work.
Maybe we can default to engine='zarr' and then if engine_type exists as a field catalog.yaml
, we supply it there?
Or we could backfill all the catalog.yaml's.
Thanks @andersy005!
perfect! can you point me to existing stores i can use for testing purposes?
Totally!
import xarray as xr
store = 'https://rice1.osn.mghpcc.org/carbonplan/virtual_datasets/gridmet/gridmet_1979_2020.parquet'
ds = xr.open_dataset(store, engine="kerchunk", chunks={})
combined_ds
I can also put a reference on the leap osn bucket or the LEAP google storage if that helps.
Totally!
import xarray as xr store = 'https://rice1.osn.mghpcc.org/carbonplan/virtual_datasets/gridmet/gridmet_1979_2020.parquet' ds = xr.open_dataset(store, engine="kerchunk", chunks={}) combined_ds
thank you, @norlandrhagen! i was able to use this for testing purposes
Thanks folks. Great to see all of these improvements moving quickly!
Thinking about if we add any virtualzarr reference datasets to the catalog. It would be nice if we could update the default 'engine="zarr"` with a value from either the meta.yaml or the catalog.yaml.
ds = xr.open_dataset(<reference_file_url>, engine="kerchunk", chunks={})