ome / ome-zarr-py

Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://pypi.org/project/ome-zarr
Other
150 stars 53 forks source link

Validate ome.zarr file with the python library? #400

Open constantinpape opened 2 weeks ago

constantinpape commented 2 weeks ago

Is there a way to validate a ome.zarr file with the python library to check if it follows the spec?

I checked the documentation, but could not find a dedicated function for it.

The closest I found was ome_zarr info, but for the file I want to validate it does not yield any output:

$ ome_zarr info ngff-v2/Platynereis-H2B-TL.ome.zarr

I have made the file available for testing here: https://drive.google.com/file/d/1WSHQWkOXUfSBahOJdVzrKbj91mczKhZP/view?usp=sharing.

And any other way to validate the file would also be fine with me.

d-v-b commented 2 weeks ago

try this (I definitely need to add a CLI like this to pydantic-ome-ngff)

# /// script
# requires-python = ">=3.9"
# dependencies = [
#   "pydantic-ome-ngff==0.6.1",
#    "zarr < 3.0.0"
# ]
# ///
import zarr
from pydantic_ome_ngff.v04 import MultiscaleGroup
import sys
fname = sys.argv[1]
group = zarr.open_group(fname, mode='r')
print(f'validating {fname}')
try:
    MultiscaleGroup.from_zarr(group)
    print(f'validation of {fname} succeeded.')
except ValueError as e:
    print(f'validation failed  with the following message: {e}')

invoke it with hatch or uv or any other tool that understands the python stand-alone script syntax:

bennettd@dvb-desktop-0 ➜  pydantic-ome-ngff git:(main) ✗ hatch run validate.py ~/Downloads/Platynereis-H2B-TL.ome.zarr/c0-t0
validating /home/bennettd/Downloads/Platynereis-H2B-TL.ome.zarr/c0-t0
validation of /home/bennettd/Downloads/Platynereis-H2B-TL.ome.zarr/c0-t0 succeeded.

According to my tool, c0-t0 is a valid multiscale group, but the root group is not, because the root group does not contain the multiscales metadata (as expected I think).

d-v-b commented 2 weeks ago

when I look at your metadata, the only thing that stands out (besides the lack of a translation transformation defined for each scale level) is the unit, which is non-standard but the spec is not normative about the unit, so that shouldn't be a validation error.

joshmoore commented 2 weeks ago

ome_zarr view ...fileset... will use the Javascript based validator locally.

constantinpape commented 2 weeks ago

Thanks for the feedback:

The answers take care of my immediate issue, but I think it would be nice to have a straight forward CLI for validation (ideally here in the library, but via pydantic_ome_ngff would also be a good solution). I will leave it open, but feel free to close if not relevant.

d-v-b commented 2 weeks ago

but I think it would be nice to have a straight forward CLI for validation

Agreed, for that effort it would be helpful to know what exactly you want to validate. There are a few scenarios:

these correspond to rather different data models.

constantinpape commented 2 weeks ago

@d-v-b : for my use-case I would want to validate that all groups in a hierarchy are either multi-scale groups or contain multiscale groups.

For some context: this issue arose while converting light-sheet data to ome.zarr for the ome-ngff-challenge. For this data we currently use the BDV.N5 data model and somehow need to map this to ome.zarr. See https://github.com/ome/ome2024-ngff-challenge/issues/45 for details.

joshmoore commented 1 week ago
  • although for the specific use case I would need something that works without a browser (to validate directly on a cluster)

:+1: for that in general.