ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
371 stars 241 forks source link

Change in CellSens 4.1 causing exception in VSI format reader. #4116

Closed ed-scanlon closed 2 months ago

ed-scanlon commented 7 months ago

I am getting an exception reading a set of Olympus VSI files produced by CellSens software 4.1 with BioFormats 7.0.1.

This problem looks similar to, but is not the same as, the issue discussed in Issue #3859 (VSI: Exception loading files from VS200 scanner with software version 3.4.1), which was fixed in PR #3925 (Olympus .vsi: only read pixels from frame_*.ets files). This is a new problem with a different cause; the version of BioFormats I am using includes the aforementioned fix.

I have two sample VSI files, each with their associated frame_*.ets files, produced from CellSens version 4.1. Each of these VSI files exhibit the same symptoms. We are presently looking to get permission to share these two images. When executing "ImageInfo -nopix -no-upgrade", this exception is thrown:

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
        at java.util.ArrayList.rangeCheck(ArrayList.java:659)
        at java.util.ArrayList.get(ArrayList.java:435)
        at loci.formats.in.CellSensReader.parseETSFile(CellSensReader.java:1272)

The offending code line is: HashMap<String, Integer> dimOrder = pyramids.get(s).dimensionOrdering;

In this exception, the 'pyramids' array has one element, and 's' is 1.

I have done a deep dive into the logic to find the cause of this problem and here's what I've discovered. I'll start by recapping the relevant logic in the CellSensReader.java. The 'pyramids' array is built while parsing through the metadata in the .vsi file; a new element is added to the array each time an EXTERNAL_FILEPROPERTIES tag is encountered. Apparently each instance of this tag in the metadata introduces a new metadata parameter block that is associated with one individual external frame*.ets file, and as well, the block provides all the information needed for describing the image stored in the corresponding frame*.ets file. The logic appears to work on the assumption that there will exist exactly one frame*.ets file for each EXTERNAL_FILE_PROPERTIES tag, and vice versa.

It is worth noting here that the method used by the code to correlate each EXTERNAL_FILEPROPERTIES tag with the appropriate frame.ets file is solely by sequence. The sequence of the frame_.ets files is established via sorting, using the folder names as the primary sort key, and the frame*.ets file name as the secondary key. The frame*.ets files are then matched one by one with the EXTERNAL_FILE_PROPERTIES tags in the sequence in which they are encountered in the .vsi metadata.

In the cases of my two failure images, the EXTERNAL_FILEPROPERTIES tag appears in the metadata exactly one time, yet each of the failed vsi images has four associated frame*.ets files! I have done extensive examination of the metadata in the vsi file to verify this is true.

The proximate cause of the exception is that there are more frame_*.ets files than there are EXTERNAL_FILE_PROPERTIES tags. Apparently some change in the latest version (V4.1) of scanner software causes, or at least allows, this situation to occur. In addition to the exception being thrown, this mismatch between the numbers of EXTERNAL_FILEPROPERTIES tags and frame*.ets files brings up another problem, which is, there is now ambiguity as to how to properly associate each EXTERNAL_FILEPROPERTIES metadata block with the correct frame*.ets file. In the cases of my two sample images, I have examined the image parameters (specifically pixel width and height) within the one EXTERNAL_FILEPROPERTIES tag's present metadata block, and it's clear that this block of metadata is NOT associated with the first frame*.ets file (based on the present sort order) but with the second. Thus, for any work-around to be successful it would not suffice to simply ignore any frame_*.ets files which have no associated EXTERNAL_FILE_PROPERTIES tag; it would first be necessary to establish a reliable way to correctly associate each EXTERNAL_FILEPROPERTIES metadata block with the appropriate frame*.ets file.

With all the above investigation completed, I am creating a PR for a proposed code change for a work-around that will:

  1. Attempt to correctly extract the most amount of pixel data from the .vsi and frame_*.ets using the information available.
  2. Will not alter the present behavior of the code for any vsi whose count of frame_*.ets files matches the number of EXTERNAL_FILE_PROPERTIES metadata blocks.

The logic of the work-around does this:

  1. When there are more frame_*.ets files than EXTERNAL_FILEPROPERTIES metadata blocks, it attempts to associate each frame*.ets file with the appropriate metadata block by using the image's pixel width and height from the metadata block.
  2. For performing this association, the pixel width and height of the highest resolution image in each frame_*.ets file can be calculated to within one tile boundary. This is done by examining the tile coordinates of all tiles in the file to find the maxX and maxY coordinate at the highest resolution (in a similar manner to logic already in the program).
  3. If the pixel width and height from the metadata block matches to within one tile boundary of the calculated width and height of the frame_*.ets, they will be associated.
  4. If a Frame*.ets file does not have an associated metadata block, it will be handled as usual but without any associated metadata. At minimum the pixel width and height of the image series within the frame*.ets files are required. These are calculated and filled in using the pixel width and height calculated from the tile coordinates during the matching process.
ed-scanlon commented 6 months ago

Hi @melissalinkert, @sbesson, after performing additional testing we found that, although this PR provides an adequate work-around for the crash in ImageInfo, it leaves an unresolved crash in ImageConverter. Here's the problem. The strategy of including the images from the extraneous .ets files in the series list is not only unnecessary, doing so causes ImageConverter to crash. It has become clear that these "orphan" frame*.ets files have no value and really should be ignored. I have posted a second commit to the PR to addresses that problem. This change alters the original work-around logic, specifically Step 4 above, so that when a .ets file has no associated metadata block, its contents are omitted from the series list, and the file is treated basically the same way as the blob*.meta files, which is to say, their names are included in the 'extraFiles' list, but they are otherwise ignored.

ed-scanlon commented 4 months ago

Hi @melissalinkert, @sbesson, I have just posted another commit to the vsi_workaround branch. This change addresses the failure case that Melissa identified where the ImageConverter may still crash when a VSI file has multiple orphan ETS files that have the same or very close dimensions. The crash occurs because multiple orphan ETS files were erroneously being associated with the same 'pyramid' metadata structures. The logic added in this commit simply ensures that any 'pyramid' metadata structure will not associated with more than one ETS file. Since I do not have a copy of the offending VSI file, I would request of Melissa that she retest with that file. Thank you!