ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
381 stars 241 forks source link

OBF file produces multiple 1-C images rather than 1 multi-C image #3969

Open joshmoore opened 1 year ago

joshmoore commented 1 year ago

An attendee at TiM2023 (Dr. Sarah Schweighofer) had issues when both when importing her OBF file into OMERO as well as when converting it to OME-TIFF with NGFF-Converter. The same behavior is seen in ImageJ.

see:

cc: @melissalinkert

melissalinkert commented 1 year ago

Looking very quickly, 4 Images each with 1 channel appears to be correct in terms of what the metadata looks like in the file. OBF uses OME-XML metadata, and in this case there are 4 separate Images each defined with a single channel.

cc @ngladitz, who has done most of the recent work on OBF support

ngladitz commented 1 year ago

As far as I understand it this has historically always been this way (i.e. before I started at Abberior) and may have been partially due to how we stream / write our image data on some systems. We have been internally discussing the possibility of changing this (potentially writing image data interleaved) or heuristically collating image data in the Bioformats reader.

The OME metadata we write can already be enormous and doesn't scale well (it is uncompressed and highly redundant).

On newer systems we started using the channel dimension for our MATRIX detectors (e.g. 23 channels I think) and the modulo extension of the channel dimension for lifetime (e.g. 250 bins) giving us e.g. a total of 5750 channels. We also use modulo T for tiling which can lead to great many OME Plane elements.

As far as I understand the specification we should already have 5750 OME Channel elements for this (maybe MATRIX channels could / should be Channel/@SamplesPerPixel but I don't know if Bioformats/ImageJ would approve) but as I understand it someone saw an example of a multi-channel dataset with a single representative OME Channel somewhere and took it as a loophole; so currently our metadata only contains a single OME Channel.

While this is already questionable it would be an even harder sell if we were to start using the channel dimension for dyes as well (and I guess might not work at all unless MATRIX channels and lifetime channels are the same for all dyes / detectors) which is why we are reluctant to change this behavior until / unless we find a way to do this with less metadata overhead and preferably in a way that doesn't cause too large a break with existing consuming software.

If OME does intend to extend the schema at some point (I guess chances aren't good seeing how the last revision is apparently 7 years old) I suppose I'd ask for (among other things) more native (non modulo) dimensions and compact / less redundant representations for potentially repetitive metadata.