ome / ZarrReader

Other
5 stars 8 forks source link

ZarrReader: Update handling of pre-existing plate metadata #49

Closed dgault closed 1 year ago

dgault commented 1 year ago

This PR is to fix the IDR import errors seen in https://github.com/IDR/idr-metadata/issues/642#issuecomment-1447820599 and other datasets.

Without this PR these datasets would fail due to duplicate keys in the WellSample ImageRefs such as below:

cvc-identity-constraint.4.2.2: Duplicate key value [Image:24] found for identity constraint "WellSampleImageRefIDKey" of element "OME".
cvc-identity-constraint.4.2.2: Duplicate key value [Image:39] found for identity constraint "WellSampleImageRefIDKey" of element "OME".
cvc-identity-constraint.4.2.2: Duplicate key value [Image:42] found for identity constraint "WellSampleImageRefIDKey" of element "OME".
cvc-identity-constraint.4.2.2: Duplicate key value [Image:45] found for identity constraint "WellSampleImageRefIDKey" of element "OME".
cvc-identity-constraint.4.2.2: Duplicate key value [Image:48] found for identity constraint "WellSampleImageRefIDKey" of element "OME".

This was due to the previous logic being to populate the metadata first from the existing METADATA.ome.xml file if present and then to override with the parsing of either the zarr attrs or the structure (depending on the format version). Due to the indexing differing between the original XML and the parsed structure this was resulting in some duplicate ID's (notably in wells with multiple fields).

The new logic is to simply remove the existing OME-XML metadata for plates and wells, only populating it from the new ngff format. With these PR there should be no validation failures for the idr0011 dataset.

dominikl commented 1 year ago

👍 Tested with bftool's showinf incl the OMEZarrReader with this PR (FYI on pilot-idr-testing /ngff/bftools)

dgault commented 1 year ago

Retested idr0011 dataset today for a sanity check and went through the metadata for plates, wells and images. Everything appears to match up correctly with the metadata from the METADATA.OME.XML. Getting this PR merged and tagging the repo.