zarr-developers / zarr-specs

Zarr core protocol for storage and retrieval of N-dimensional typed arrays
https://zarr-specs.readthedocs.io/
Creative Commons Attribution 4.0 International
87 stars 28 forks source link

Explicitly listing groups/arrays inside group metadata? #284

Open shoyer opened 8 months ago

shoyer commented 8 months ago

I'm curious if explicitly listing groups/arrays inside group metadata has been discussed before.

The downside is that this is redundant and potentially duplicate information (but in some sense so is all group metadata, see "implicit groups").

One advantage would be that this would eliminates the need to list the contents of a store and check for the existence of metadata objects, which can sometimes be rather expensive. It's kind of a half-way step to the consolidated metadata of Zarr v2.

It's also potentially useful for making group creation/modification atomic without race conditions like https://github.com/zarr-developers/zarr-python/issues/1435, because the canonical list of a group's contents is a single metadata file rather than a collection of sub-directories, which usually cannot be modied in an atomic fashion.

jhamman commented 8 months ago

We have been discussing an extension that would provided links to parents and children so that from any point within the hierarchy, you could navigate up or down the tree without having to list the store. This is conceptually similar to how STAC works.

This approach would be more scalable than the current consolidated metadata approach with the obvious tradeoff that walking the tree one node at a time will often be more expensive than loading a single mapping of all the metadata.

cc @rabernat and @jedsundwall

jedsundwall commented 8 months ago

Adding @kbgg who's working on this.