intake / intake-esm

An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
https://intake-esm.readthedocs.io
Apache License 2.0
135 stars 46 forks source link

Better support for datatree / kerchunk #594

Open dcherian opened 1 year ago

dcherian commented 1 year ago

I've been using kerchunk to generate aggregated datasets that have a Zarr group for each "stream" (this could be data on different grids and at different frequencies, e.g. full depth grid monthly means, and daily mean surface data).

I've been sticking them as reference files which works well.

I'd like to stick a single entry per simulation in a intake-esm catalog and read with datatree.open_datatree

I think I have two requests:

  1. turn off aggregation, which seems to be a common request. I'd rather do the aggregation "at write-time" by creating an appropriate JSON file that takes care of various idiosyncrasis (e.g. merging in "static variables") instead of pushing it to the user at read-time.
  2. a entry in the catalog that switches between using xr.open_dataset and datatree.open_datatree. Eventually, there will be a xr.open_datatree but the underlying concept of two different functions to open a group vs a full tree will still be around.
dcherian commented 1 year ago

Here's a catalog where there is an entry for each "stream": h,sfc, wci; and a aggregated dataset with stream="combined".

I'd like to pick some simulations and load the combined stream as a datatree

dcherian commented 1 year ago

Do you have any thoughts on how to do this?

andersy005 commented 1 year ago

One step closer with

This should enable the following

turn off aggregation, which seems to be a common request. I'd rather do the aggregation "at write-time" by creating an appropriate JSON file that takes care of various idiosyncrasis (e.g. merging in "static variables") instead of pushing it to the user at read-time.

andersy005 commented 1 year ago

regarding

a entry in the catalog that switches between using xr.open_dataset and datatree.open_datatree. Eventually, there will be a xr.open_datatree but the underlying concept of two different functions to open a group vs a full tree will still be around.

i haven't had a chance to look into possible options. i intend to get back to you next week with some ideas :)