CLI tools for Zarr V2->V3 conversion

LDeakin commented 1 month ago

I've developed a few CLI tools in zarrs_tools (crates.io / GitHub) that may be useful to some for this challenge.

These tools have not been extensively tested with real-world Zarr V2 data, since we don't use Zarr V2 in my lab.

`zarrs_reencode`

Convert a V3 compatible subset of Zarr V2 arrays to Zarr V3. Many parameters are available for the output array encoding.

zarrs_reencode \
--chunk-shape 32,32,32 \
--shard-shape 256,256,256 \
--bytes-to-bytes-codecs '[ { "name": "blosc", "configuration": { "cname": "blosclz", "clevel": 9, "shuffle": "bitshuffle", "typesize": 2, "blocksize": 0 } } ]' \
--separator / \
--attributes-append '{ "looking forward to": "OME-Zarr v0.5" }' \
array_v2.zarr array_v3.zarr

`zarrs_info`

Output the V3 equivalent metadata of a V2 array if it is compatible with only a metadata change. It works for groups too.

zarrs_info group_v2.zarr/array metadata-v3
zarrs_info group_v2.zarr metadata-v3

It takes care of things like the typesize in blosc codec metadata.

`zarrs_ome`

Generate multiscale arrays for visualisation.

zarrs_ome array.zarr array.ome.zarr

Since last year, my lab has used zarrs_ome to create sharded Zarr V3 multiscale images ($\gtrapprox3000^3$) for visualisation in neuroglancer.

normanrz commented 1 month ago

Really cool!

joshmoore commented 1 month ago

Agreed! I'll compare the outputs (and hopefully the speeds) against a couple of other implementations, hopefully starting later this week.

joshmoore commented 1 month ago

Reencode 4496763-v2.zarr/0 to 4496763-v3-reencode
    read:  ~152.78ms @ 3.60GB/s
    write: ~648.93ms @ 0.77GB/s
    total: 801.71ms
    size:  550.78MB to 502.71MB (838.86MB uncompressed)

:+1: (Trying to get an apples-to-apple comparison with, say, tensorstore now)

joshmoore commented 3 weeks ago

Closing this. For anyone interested in trying this out, see the --output-script option.

ome / ome2024-ngff-challenge