Closed will-moore closed 1 year ago
So just to clarify: we (1) generated invalid (from omero-cli-zarr) and (2) validated invalid datasets, but the spec never said that it was correct, right?
I'm trying to get my head around whether or not we need a major warning of any form (and/or be permissive of the plural for this version?!)
Not sure what you mean by:
but the spec never said that it was correct
Translating the spec into JSON schema (or vice versa) was a manual process, hence error-prone.
It's invalid for the "unit" to be a number (instead of a string) but it's not invalid to add a custom "units" value, since JSON schema allows custom keys. I think it's possible to set the schema to dis-allow any custom / unrecognised keys, but I think we've chosen not to do that because we want to allow extensibility or at least temp addition of custom metadata. Even with the fix to "unit", we are still permissive of "units"!
I'm not sure we need a major warning. The most significant "bug" would be that if you used the "units" schema, then it wouldn't have flagged "unit": 100
as invalid. But that seems kinda edge-case scenario.
Not sure what you mean by:
i.e. the typo was never in the spec, but as you say, only in the translation to the validator and omero-cli-zarr.
Even with the fix to "unit", we are still permissive of "units"!
With permissive I meant more that we would also recognize the plural as valid. (Would be quite the kludge)
The most significant "bug" would be that if you used the "units" schema, then it wouldn't have flagged "unit": 100 as invalid. But that seems kinda edge-case scenario.
For this repo, agreed. But do we have a feeling how many filesets are out there? Can the validator warn about the particular use case of "units"
? i.e. (softly) disallow it to help people out?
My understanding is that the generated datasets are syntactically valid since unit
is not mandatory but a recommended property of the axes
specification and custom keys like units
are not forbidden by the spec
Despite their validity, the datasets we generated did not represent what was intending as the unit metadata was effectively missing. Additionally, the published schema were incorrectly validating the units
key (if it existed) and ignoring the unit
key (if it existed).
I feel that supporting both forms is a lot of work and likely overkill esp. without having any idea of the impact (so far only omero-cli-zarr
to the best of our knowledge would generate dataset using the plural form). Including some tooling/extending the validator to warn of the presence of units
might be a good idea.
Going through v0.4 samples listed at https://idr.github.io/ome-ngff-samples/ these are the ones with "units" that need fixing:
As well as a couple of others that I've created and listed for the https://will-moore.github.io/ome-ngff-tools/
I've manually fixed "units" -> "unit" in the 'done' samples above, with e.g.:
mc cp uk1s3/idr/zarr/v0.4/idr0050A/4995115.zarr/.zattrs 4995115.zattrs
vi 4995115.zattrs
mc cp 4995115.zattrs uk1s3/idr/zarr/v0.4/idr0050A/4995115.zarr/.zattrs
but this isn't going to scale well for those 2 plates. Any tips on how to script/automate this?
Thanks for looking into this!
FYI, we just ran into units
again in https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr/labels/0/.zattrs – the dataset has a checkmark in the above list, but apparently the labels .zattrs file was not yet adapted?
@fm3 Thanks for picking that up. Fixed units
-> unit
in https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr/labels/0/ just now.
To fix units -> unit in plate...
Download just the .zattrs locally...
aws s3 sync --no-sign-request --endpoint-url https://uk1s3.embassy.ebi.ac.uk --exclude '*' --include "*.zattrs" s3://idr/zarr/v0.4/idr0001A/2551.zarr .
Replaced all "units" with "unit" (1724 occurrences across 862 files)
Tried to re-upload with aws but no joy (dont' have mc
installed locally):
$ aws s3 cp .zattrs s3://idr/zarr/v0.4/idr0001A/2551.zarr/.zattrs
upload failed: ./.zattrs to s3://idr/zarr/v0.4/idr0001A/2551.zarr/.zattrs An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
Instead, copied to where I do have mc
...
$ rsync -rvP --progress ./ ome-zarr-dev1.openmicroscopy.org:/lifesci/groups/jrs/wmoore/idr0001/2551.zarr
And then from there uploaded as before...
$ mc cp -r 2551.zarr/ uk1s3/idr/zarr/v0.4/idr0001A/2551.zarr/
Repeated with plate: idr0072B/9512.zarr: 2880 occurrences replaced in 1440 files.
rsync -rvP --progress ./ ome-zarr-dev1.openmicroscopy.org:/lifesci/groups/jrs/wmoore/idr0072/9512.zarr
etc.
Tested that all wells and images in plate are valid. Checked for "unit" in images. Eg. https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0072B/9512.zarr/I/13/13/
@joshmoore All samples above are fixed "units" -> "unit" now. Good to merge and get this fix in?
Any outstanding fixes required here? Would be good to get this in.
@joshmoore - Good to get this fix in? - since this fixes an existing bug with the live schemas. Then we can consider #168 as a potential follow-up?
Fixes #156