Open yarikoptic opened 2 months ago
Hi @yarikoptic, thanks for sharing this! Looks cool!
FYI, we are planning on open sourcing the solution we have built at Earthmover later this fall.
Hi folks! We released our project! You can read all about it here: https://icechunk.io/
Inspired by
154 by @rabernat
I've decided to share ongoing design we are pursuing and seek for possible feedback and possibly guidance and/or collaboration.
In DANDI archive (https://dandiarchive.org/) where we use versioned S3 bucket for actual data storage, we are also working to allow for versioning of Zarr filesets. Notes on the ultimate design could be found in
but in a nutshell it is centered around simple aspects of S3 versioned bucket, checksum over files in a Zarr and collecting a "manifest" file with information about keys/versionIds for a given version of Zarr (so ideas similar to git itself). In more detail:
To show feasibility of such approach we provide
But I wondered, is there a way or a need to possibly formalize some "zarr manifest" listing which could then be reused across solutions? I am not quite sure if it is at the level of storage transformers since IMHO it should be rather a specification on top of zarr instance, in comparison to the specification within zarr. WDYT?