ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
376 stars 242 forks source link

OME-TIFF Reader overwriting IDS #3685

Open dgault opened 3 years ago

dgault commented 3 years ago

Issue was raised on imagesc thread: https://forum.image.sc/t/different-custom-metadata-parsing-bioformat-using-fiji-knime-command-line/51704

A sample file has been provided but can also be easily reproduced by changing IDs in existing sample files. Tested with version 6.6.1.

The issue appears to be the call to MetadataTools.populatePixels within the OMETiffReader which is overwriting the existing ID's set on the metdatastore.

imagesc-bot commented 3 years ago

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/different-custom-metadata-parsing-bioformat-using-fiji-knime-command-line/51704/2

sbesson commented 2 years ago

This issue is also a duplicate of the problem originally described in https://trac.openmicroscopy.org/ome/ticket/13270.

From a recent investigation, this overwriting behavior applies to all readers reading OME-XMLand converting it into the Metadata API incl. OMEXMLReader. The MetadataTools helper methods overwrite the IDs of three elements:

One implication is that the IDs of the generated OME-XML are always compliant and follow the convention of being named after the series ID. As demonstrated in the Trac ticket linked above, if the element IDs are used elsewhere e.g. as ImageRef in a Plate/Well/WellSample hierarchy, the references are not updated and this creates a broken metadata representation.

From a former discussion with the @ome/formats team, @melissalinkert raised the point that overwriting IDs might still be necessary in some examples of our curated repository e.g. when the original XML is invalid and the current reader behavior attempts to correct some of these issues. Unilaterally removing this behavior might also cause regression with existing data and make it unreadable.

As an intermediate solution, my current proposal would be to make the decision based on the validity of the original OME-XML as follows:

As a starting point, https://github.com/sbesson/bioformats/commit/88746ed2b8e2b1ded25d5f2ff52cca6e39a62f9e adds a few unit tests capturing the current behavior.