ome / ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
https://ngff.openmicroscopy.org
Other
119 stars 41 forks source link

units unit fix #157

Closed will-moore closed 1 year ago

will-moore commented 2 years ago

Fixes #156

github-actions[bot] commented 2 years ago

Automated Review URLs

joshmoore commented 2 years ago

So just to clarify: we (1) generated invalid (from omero-cli-zarr) and (2) validated invalid datasets, but the spec never said that it was correct, right?

I'm trying to get my head around whether or not we need a major warning of any form (and/or be permissive of the plural for this version?!)

will-moore commented 2 years ago

Not sure what you mean by:

but the spec never said that it was correct

Translating the spec into JSON schema (or vice versa) was a manual process, hence error-prone.

It's invalid for the "unit" to be a number (instead of a string) but it's not invalid to add a custom "units" value, since JSON schema allows custom keys. I think it's possible to set the schema to dis-allow any custom / unrecognised keys, but I think we've chosen not to do that because we want to allow extensibility or at least temp addition of custom metadata. Even with the fix to "unit", we are still permissive of "units"!

I'm not sure we need a major warning. The most significant "bug" would be that if you used the "units" schema, then it wouldn't have flagged "unit": 100 as invalid. But that seems kinda edge-case scenario.

joshmoore commented 2 years ago

Not sure what you mean by:

i.e. the typo was never in the spec, but as you say, only in the translation to the validator and omero-cli-zarr.

Even with the fix to "unit", we are still permissive of "units"!

With permissive I meant more that we would also recognize the plural as valid. (Would be quite the kludge)

The most significant "bug" would be that if you used the "units" schema, then it wouldn't have flagged "unit": 100 as invalid. But that seems kinda edge-case scenario.

For this repo, agreed. But do we have a feeling how many filesets are out there? Can the validator warn about the particular use case of "units"? i.e. (softly) disallow it to help people out?

sbesson commented 2 years ago

My understanding is that the generated datasets are syntactically valid since unit is not mandatory but a recommended property of the axes specification and custom keys like units are not forbidden by the spec

Despite their validity, the datasets we generated did not represent what was intending as the unit metadata was effectively missing. Additionally, the published schema were incorrectly validating the units key (if it existed) and ignoring the unit key (if it existed).

I feel that supporting both forms is a lot of work and likely overkill esp. without having any idea of the impact (so far only omero-cli-zarr to the best of our knowledge would generate dataset using the plural form). Including some tooling/extending the validator to warn of the presence of units might be a good idea.

will-moore commented 2 years ago

Going through v0.4 samples listed at https://idr.github.io/ome-ngff-samples/ these are the ones with "units" that need fixing:

As well as a couple of others that I've created and listed for the https://will-moore.github.io/ome-ngff-tools/

will-moore commented 2 years ago

I've manually fixed "units" -> "unit" in the 'done' samples above, with e.g.:

mc cp uk1s3/idr/zarr/v0.4/idr0050A/4995115.zarr/.zattrs 4995115.zattrs
vi 4995115.zattrs 
mc cp 4995115.zattrs uk1s3/idr/zarr/v0.4/idr0050A/4995115.zarr/.zattrs

but this isn't going to scale well for those 2 plates. Any tips on how to script/automate this?

fm3 commented 2 years ago

Thanks for looking into this!

FYI, we just ran into units again in https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr/labels/0/.zattrs – the dataset has a checkmark in the above list, but apparently the labels .zattrs file was not yet adapted?

will-moore commented 2 years ago

@fm3 Thanks for picking that up. Fixed units -> unit in https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr/labels/0/ just now.

will-moore commented 2 years ago

To fix units -> unit in plate...

Download just the .zattrs locally...

aws s3 sync --no-sign-request  --endpoint-url https://uk1s3.embassy.ebi.ac.uk --exclude '*' --include "*.zattrs" s3://idr/zarr/v0.4/idr0001A/2551.zarr .

Replaced all "units" with "unit" (1724 occurrences across 862 files)

Tried to re-upload with aws but no joy (dont' have mc installed locally):

$ aws s3 cp .zattrs s3://idr/zarr/v0.4/idr0001A/2551.zarr/.zattrs
upload failed: ./.zattrs to s3://idr/zarr/v0.4/idr0001A/2551.zarr/.zattrs An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.

Instead, copied to where I do have mc...

$ rsync -rvP --progress ./ ome-zarr-dev1.openmicroscopy.org:/lifesci/groups/jrs/wmoore/idr0001/2551.zarr

And then from there uploaded as before...

$ mc cp -r 2551.zarr/ uk1s3/idr/zarr/v0.4/idr0001A/2551.zarr/ 
will-moore commented 2 years ago

Repeated with plate: idr0072B/9512.zarr: 2880 occurrences replaced in 1440 files.

rsync -rvP --progress ./ ome-zarr-dev1.openmicroscopy.org:/lifesci/groups/jrs/wmoore/idr0072/9512.zarr

etc.

Tested that all wells and images in plate are valid. Checked for "unit" in images. Eg. https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0072B/9512.zarr/I/13/13/

will-moore commented 2 years ago

@joshmoore All samples above are fixed "units" -> "unit" now. Good to merge and get this fix in?

will-moore commented 1 year ago

Any outstanding fixes required here? Would be good to get this in.

will-moore commented 1 year ago

@joshmoore - Good to get this fix in? - since this fixes an existing bug with the live schemas. Then we can consider #168 as a potential follow-up?