ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
370 stars 241 forks source link

BioFormats DICOM WSI reader problem with nested Pixel Data in private data element and/or offset tables #4165

Closed dclunie closed 2 months ago

dclunie commented 3 months ago

One of my tools creates DICOM WSI images that are dual-personality TIFF, but also contain a "hidden" pyramid of TIFF sub-layers, using a private data element, so that ordinary TIFF readers have the pyramid they are expecting, even though DICOM spreads the pyramid over multiple files.

Normal DICOM readers ignore these hidden, and use the PixelData element in the top level data set.

Unfortunately, adding these seems to confuse the BioFormats reader.

See the samples at:

https://www.dropbox.com/scl/fo/za255deqf1rrmmwqsv5lv/h?rlkey=b3uow8ormdaz0tr4q7g3u5yyl

with screenshots of QuPath (5) using BioFormats and OpenSlide readers.

I am just guessing that the nested Pixel Data in the private data element is the problem, since if I don't include it, the BioFormats reader works OK. It may be something else.

These test images also contain both the Basic Offset Table and Extended Offset Tables with pointers to the actual pixel data frame Items. I am testing including these to allow faster reading when present. Note that the standard says that only one or the other should be present, not both, so these are just for testing. Not sure if BioFormats takes advantage of the presence of either.

melissalinkert commented 3 months ago

Thanks, @dclunie. With 7.2.0 and either showinf -noflat -resolution 2 img_0.dcm or showinf -nogroup img_3.dcm, I can see an image with tiles that are obviously incorrect.

The nested Pixel Data does seem to be the problem, and in particular we'll need to update https://github.com/ome/bioformats/blob/develop/components/formats-bsd/src/loci/formats/dicom/DicomTag.java#L312 and surrounding lines. That's assuming that a Pixel Data isn't nested within a Sequence, so if a Pixel Data is found within the current Sequence we should stop and assume the Sequence ended immediately before the Pixel Data. I don't have a fix quite yet, but am adding this issue to the next milestone so that it gets prioritized.

For internal testing, files here are now in inbox/gh-4165.