zarr-developers / zarr-specs

Zarr core protocol for storage and retrieval of N-dimensional typed arrays
https://zarr-specs.readthedocs.io/
Creative Commons Attribution 4.0 International
87 stars 28 forks source link

feature(stores): draft zip file store specification #311

Open jhamman opened 2 weeks ago

jhamman commented 2 weeks ago

This is a working draft of the v3 ZIP file store specification.

xref:

joshmoore commented 2 weeks ago

In my experience, the root of the zip is one of the trickiest parts for data creators (and I assume implementers) to get right, e.g.,

joshmoore commented 2 weeks ago

cc: @DennisHeimbigner

zoj613 commented 2 weeks ago

How useful is a ZipStore in practice? Are there a lot of use cases for it? Given how limited it is (no rename/deletion, etc) I am wondering if its worth having a spec for it

DennisHeimbigner commented 2 weeks ago

I have support equivalent to zipstore in nczarr in the netcdf-c library. I agree that it does not appear to be very useful, but the basic idea behind it is reasonable: a single file containing a complete zarr file tree, and using compression the component files to save space. Personally, I think that using a single file file system (SFFS) with added compression makes more sense. There are several implementations available, and it is easy enough to write your own,

jhamman commented 2 weeks ago

In my experience, the root of the zip is one of the trickiest parts for data creators (and I assume implementers) to get right...

@joshmoore - do you have suggestions for the spec document that would make this clearer?


@zoj613 and @DennisHeimbigner - let's try to avoid making this about alternatives to the ZIP store concept. There are practical reasons to add this (Zarr-Python has long supported a ZIP store interface).

Remember, Zarr can support many storage backends. If there are alternatives to experiment with, let's do that in a separate issue.


@DennisHeimbigner - I would like to get your feedback on the spec as written. Is it aligned with your netcdf-c implementation?

joshmoore commented 2 weeks ago

@joshmoore - do you have suggestions for the spec document that would make this clearer?

Thoughts that I have revolving in my head that include:

DennisHeimbigner commented 1 day ago

for the format, the most important item I know of is "don't include the top-level directory" (though I have run into some complaints about that from various repositories, since the behavior differs between implementations, e.g. on Windows)

I think I have always used either linux zip or cygwin zip to create zarr zip files. What native windows program could I use to create a pure windows zip file? As for the top-level directory, I think it is better to always include it. I say this so that my rule holds, namely: 1.unzipping a zip store creates a directory tree usable by the zarr directory tree storage manager.

  1. zipping a zarr directory tree creates a zip store conforming to the proposed zip spec.