ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
379 stars 241 forks source link

OIRReader: Major performance improvement when reading a ROI within a plane #4205

Closed NicoKiaru closed 3 weeks ago

NicoKiaru commented 3 months ago

In the implementation before this PR each time a small ROI is read from an image, a full OIR Pixelblock was read. Combined with the fact that the Optimal Tile Size was not overriden and defaulted to a thin stripe in y (128 pixels), this could lead to a particularly bad scenario for big images:

The proposed new implementation of this PR:

When requesting a cropped region within big images, the speedup is massive.

Here's a real time display with big dataviewer with the current implementation: previousreader

Here's a real time display with big dataviewer with the implementation in this PR: prreader

imagesc-bot commented 3 months ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/why-is-bioformats-reading-large-oir-files-so-much-slower-than-the-microscope-companion-software-provided-by-olympus/66922/6

dgault commented 3 months ago

Thanks again @NicoKiaru for the hard work investigating and getting the PR opened. I have marked it for inclusion for the repository tests to check for any side effects in other datasets.

NicoKiaru commented 3 months ago

Thanks, I have tested it on very few files, so while I tried to keep the changes to the minimum, I'm not sure I did not break something! Let's see.

jburel commented 3 months ago

Failed with the following errors

  [testng] java.lang.NullPointerException: Cannot assign field "yStart" because "block" is null
   [testng]     at loci.formats.in.OIRReader.initFile(OIRReader.java:432)
   [testng]     at loci.formats.FormatReader.setId(FormatReader.java:1480)
   [testng] java.lang.ArrayIndexOutOfBoundsException: Index 1023 out of bounds for length 1023
at loci.formats.in.OIRReader.initFile(OIRReader.java:420)
melissalinkert commented 3 months ago

Note that some of the test failures occur on public data:

NicoKiaru commented 2 months ago

So I've fixed what I could test with the public files.

dgault commented 2 months ago

Added a commit to fix a NullPointerException in the nightly tests (https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/18844/console)

dgault commented 2 months ago

Added a further commit to address failures in the nightly tests: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/19008/console

These last 2 failures related to spectral images using modulo C. I have updated the indexing for the blocks to match the previous behaviour.

dgault commented 2 months ago

I am still seeing failures for the modulo C files in the nightly tests, though from testing it looks as though the original implementation currently in the reader may also be incorrect. In both cases a single set of pixel blocks is being used for each of the modulo C frames, the tests fail because this PR is defaults to a different frame.

The real issue seems to be that in the case of the Lambda files as set here, that I am unsure how to identify from the pixel block uids which index they correctly belong to. Currently both implementations are matching the channel id which is the same for each step and results in the same frames each time. I will try and dig further to see if I can correctly identify the correct blocks.

NicoKiaru commented 2 months ago

Ok, good luck with that. If you have a public lambda file, I could also investigate. But if not I can also try to create one and make it public.

NicoKiaru commented 2 months ago

I've submitted some example file to the bio-formats QA community in Zenodo. They cover:

EDIT: Zenodo repo: https://zenodo.org/records/12773657

And it looks like none of these files are properly opened by the current reader (and I'm pretty sure this PR does not improve the situation).

NicoKiaru commented 2 months ago

I've made a small modification, which, in the files I could test, fixes the reading of spectral acquisition.

NicoKiaru commented 1 month ago

Hello, do you think you would be able to give it a (new) go ? I've shared some sample spectral dataset + the previous version was not able to read spectral dataset from FV4000. Of course there may still be issued with private files, but I can't fix anything in that regard.

melissalinkert commented 1 month ago

Thanks for the updates, @NicoKiaru, and apologies for the delay. This will now be included in tonight's builds, so we'll know the status in the morning.

melissalinkert commented 4 weeks ago

This appears to have passed all tests (https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/27045/console), so is on my list to review for inclusion in 8.0.0.

NicoKiaru commented 4 weeks ago

Thanks! Before doing that please wait a day or two: I want to upload such a big 2D dataset since I'm not sure you have that in the demo files.

NicoKiaru commented 4 weeks ago

Ok, I've submitted a tiled image on Zenodo bio-formats submission community.

I think to demo the performance increase I should rather have done a single very large plane than a z-stack. But timing the opening with a 'crop on import' the perf increase should still be visible I hope.

joshmoore commented 4 weeks ago

see https://zenodo.org/records/13680725

NicoKiaru commented 3 weeks ago

Thanks a lot @melissalinkert for this thorough review! I'll get back to it next week.

NicoKiaru commented 3 weeks ago

I've done the modifications you suggested. Thanks again @melissalinkert !