ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
378 stars 241 forks source link

Reduce DICOM write time #4164

Closed melissalinkert closed 1 week ago

melissalinkert commented 6 months ago

Opening as a draft since this is a work in progress, with additional changes to both reading and writing planned.

4e81795 should reduce the total time to run bfconvert with DICOM output. This is just buffering the header writes, and does not change the tile writing. A simple test conversion using https://downloads.openmicroscopy.org/images/Vectra-QPTIFF/perkinelmer/PKI_scans/HandEcompressed_Scan1.qptiff:

bfconvert -no-upgrade -noflat -tilex 256 -tiley 256 -compression JPEG HandEcompressed_Scan1.qptiff test.dcm

is noticeably faster with 4e81795 when testing locally.

melissalinkert commented 1 month ago

Thanks, @sbesson. The NDPI difference may be due to picking more suitable default tile sizes, i.e. https://github.com/ome/bioformats/pull/4181/files#diff-32978063b81116de867b85238d17dd0f0a9fcc06c1e1689cb64bae18b7e3ee02L317.

99462118cc is a better try at reducing tile processing times. Trying CMU-1.svs and HandEcompressed_Scan1.qptiff locally with this change:

$ bfconvert -version
Version: 8.0.0-SNAPSHOT
Build date: 29 August 2024
VCS revision: 99462118cc80f1f8e9c6407f4f8825c82b301c24
$ time bfconvert -no-upgrade -noflat -compression JPEG HandEcompressed_Scan1.qptiff 4164/qptiff.dcm
HandEcompressed_Scan1.qptiff
VectraReader initializing HandEcompressed_Scan1.qptiff
Reading IFDs
Populating metadata
Populating OME metadata
[PerkinElmer Vectra/QPTIFF] -> 4164/qptiff.dcm [DICOM]
Tile size = 512 x 512
    Series 0: converted 1/1 planes (100%)
Tile size = 512 x 512
    Series 0: converted 1/1 planes (100%)
Tile size = 512 x 512
    Series 0: converted 1/1 planes (100%)
    Series 0: converted 1/1 planes (100%)
    Series 0: converted 1/1 planes (100%)
    Series 1: converted 1/1 planes (100%)
    Series 2: converted 1/1 planes (100%)
    Series 3: converted 1/1 planes (100%)
[done]
124.583s elapsed (354.25+14912.5ms per plane, 1735ms overhead)

real    2m5.605s
user    2m0.488s
sys 0m0.707s
$ time bfconvert -no-upgrade -noflat -compression JPEG CMU-1.svs 4164/cmu1.dcm
CMU-1.svs
SVSReader initializing CMU-1.svs
Reading IFDs
Populating metadata
Populating OME metadata
[Aperio SVS] -> 4164/cmu1.dcm [DICOM]
More than 4GB of pixel data, compression will need to be used
Tile size = 256 x 256
    Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
    Series 0: converted 1/1 planes (100%)
    Series 0: converted 1/1 planes (100%)
    Series 1: converted 1/1 planes (100%)
    Series 2: converted 1/1 planes (100%)
[done]
177.498s elapsed (196.2+34790.8ms per plane, 1809ms overhead)

real    2m58.588s
user    2m51.613s
sys 0m1.601s

and with current state of develop:

$ bfconvert -version
Version: 8.0.0-SNAPSHOT
Build date: 29 August 2024
VCS revision: 54cf1e8106592e9f3f44817d053e65db799dbaf9
$ time bfconvert -no-upgrade -noflat -compression JPEG HandEcompressed_Scan1.qptiff develop/qptiff.dcm
HandEcompressed_Scan1.qptiff
VectraReader initializing HandEcompressed_Scan1.qptiff
Reading IFDs
Populating metadata
Populating OME metadata
[PerkinElmer Vectra/QPTIFF] -> develop/qptiff.dcm [DICOM]
Tile size = 512 x 512
    Series 0: converted 1/1 planes (100%)
Tile size = 512 x 512
    Series 0: converted 1/1 planes (100%)
Tile size = 512 x 512
    Series 0: converted 1/1 planes (100%)
    Series 0: converted 1/1 planes (100%)
    Series 0: converted 1/1 planes (100%)
    Series 1: converted 1/1 planes (100%)
    Series 2: converted 1/1 planes (100%)
    Series 3: converted 1/1 planes (100%)
[done]
145.46s elapsed (358.125+17468.25ms per plane, 1961ms overhead)

real    2m26.473s
user    2m23.024s
sys 0m0.733s
$ time bfconvert -no-upgrade -noflat -compression JPEG CMU-1.svs develop/cmu1.dcm
CMU-1.svs
SVSReader initializing CMU-1.svs
Reading IFDs
Populating metadata
Populating OME metadata
[Aperio SVS] -> develop/cmu1.dcm [DICOM]
More than 4GB of pixel data, compression will need to be used
Tile size = 256 x 256
    Series 0: converted 1/1 planes (100%)
Tile size = 256 x 256
    Series 0: converted 1/1 planes (100%)
    Series 0: converted 1/1 planes (100%)
    Series 1: converted 1/1 planes (100%)
    Series 2: converted 1/1 planes (100%)
[done]
207.057s elapsed (218.2+40727.0ms per plane, 1630ms overhead)

real    3m28.065s
user    3m23.033s
sys 0m1.451s

which appears to be a ~15% improvement in both cases.

joshmoore commented 3 weeks ago

A quick thought: is there any possibility that the DEBUG-level logging of StopWatch could slow things down in a GUI application, e.g. ImageJ or QuPath?

melissalinkert commented 3 weeks ago

In ImageJ, I see no issue - since the StopWatch logging is at DEBUG and the plugin uses INFO in a way that is not easy for a user to change (https://github.com/ome/bioformats/blob/develop/components/bio-formats-plugins/src/loci/plugins/LociImporter.java#L69), there shouldn't be any impact there.

In QuPath, I am having a hard time getting a test setup that includes this PR's changes, so haven't evaluated directly. As with ImageJ, the default log level is INFO, but this is easy to change. I think worst case would be someone has DEBUG/TRACE turned on and sees a problem, in which case we suggest turning it back to INFO.

If setting the StopWatches to TRACE instead of DEBUG would be preferable, I am fine with doing that.

sbesson commented 2 weeks ago

@joshmoore any objection to merging this in the current state based on https://github.com/ome/bioformats/pull/4164#issuecomment-2327045988 (before you going on the road for a few weeks)?

joshmoore commented 2 weeks ago

No objections. Sorry. I didn't mean to hold this up.