zarr-developers / VirtualiZarr

Create virtual Zarr stores from archival data files using xarray syntax
https://virtualizarr.readthedocs.io/en/latest/
Apache License 2.0
99 stars 21 forks source link

allow writing `DataTree` objects containing references to disk #244

Open keewis opened 2 weeks ago

keewis commented 2 weeks ago

In trying to create a nicer way to access ocean model output (several stacks of netcdf files where each stack can be concatenated, but not necessarily merged into a single Dataset object), I've been able to construct a DataTree object:

hourly = xr.concat([virtualizarr.open_virtual_dataset(path, ...) for path in paths], dim="time", ...)
daily = xr.concat([virtualizarr.open_virtual_dataset(path, ...) for path in paths], dim="time", ...)
monthly = xr.concat([virtualizarr.open_virtual_dataset(path, ...) for path in paths], dim="time", ...)

tree = DataTree.from_dict({"/": ..., "/hourly": hourly, "/daily": daily, "/monthly": monthly})

but would then need a way to write that tree to disk.

The current file formats (except maybe parquet, but not sure), definitely support this since they're based on zarr, we'd just need to create a DataTree accessor and write the code to serialize DataTree objects containing ManifestArrays.

Edit: related to #84 and #11

TomNicholas commented 2 weeks ago

I was going to say that this is a duplicate of #84, but it's actually not because being able to write from DataTree is useful even if we have not yet implemented open_virtual_datatree.