ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
370 stars 241 forks source link

Andor SIF: Support files larger than 4 GB and with non-fixed-size footers #4195

Open WalkerKnapp opened 3 weeks ago

WalkerKnapp commented 3 weeks ago

Hello! Our lab generates a large number of Andor SIF files which run into issues being loaded using bioformats (a IllegalArgumentException is thrown upon reading the first frame). As it turns out, this is the result of two issues:

Output of tail for one of our Andor SIF files

``` D�D�D�CD�DD��C� D�C��C��C@ D�D��C�C��C�C@ DD@D@D��C��C��C�DD��C��C��C��C�C�D DDD�C�C��C�CD�D��C�D@ DD�D@D�D�D@D��CD�D��CD�D��C@D��C�C��C�D�C�D��C�D�C D��CD�D�D@D�D�C��C�C��C�D�C�C�CD�D@D�D�C�C�C�D��C�C0 0 0 0 Gain 1 381 0 -1 0 -1 X-14188 -999 0 0 0 -1 0 0 0 0 -1 0 -1 10 0 0 �SIFX% ```

Both of these issues can be solved by parsing the entire SIF header, instead of relying on the positioning of the footer to determine where pixel data starts. I modified the header parsing based on @fujiisoup's sif_parser (https://github.com/fujiisoup/sif_parser/blob/a922fc299b749057f66bedaba3b6971e18f94c4e/sif_parser/_sif_open.py), which is able to open our files without issue.

Opening as a draft for now, because I am not sure if this works with all SIF file versions. Feedback and others testing their SIF files would be appreciated!

Specifically, on line 163, I believe parsing will fail when the first two bytes of image data are 0x300A. As far as I understand, this would only happen if the top-left pixel on frame 0 contains a value on the order of 5e-10. I can't think of a scenario where you would have measurements that small, but I would be interested to see if others could come up with one!

dgault commented 2 weeks ago

Thanks @WalkerKnapp for opening the initial draft. If are able to provide some sample files demonstrating the issue that would greatly help the testing. If you need a suitable upload location then we recommend using https://zenodo.org/.

Also, we have a Contributor License Agreement for the project, would you be able to sign and return the form following the instructions on https://ome-contributing.readthedocs.io/en/latest/cla.html

The first step for testing and review will be to move this PR out draft status and we will then include it as part of the CI and our daily test suite. This will test for any regressions in existing sample files that we have.

WalkerKnapp commented 2 weeks ago

Completed a CLA! In regards to publicly hosting a sample file, I will work on getting that approved (there are questions of rights and funding that are above my pay grade). In the meantime, I can also share example files from our university's cluster to anyone interested who has a Globus account, if that would be helpful.

dgault commented 2 weeks ago

Thanks @WalkerKnapp, I can confirm that we have received your CLA. There was only a single failure from the nightly tests, and I suspect that may actually be a fix that requires us to update our configs. Unfortunately that file isn't public, but I will investigate and confirm if that is the case.

dgault commented 5 days ago

Hi @WalkerKnapp, I was wondering if there was any update on potentially getting a public sample file for this PR? If that is not possible then we can arrange to have it shared via Globus.