imi-bigpicture / wsidicom

Python package for reading DICOM WSI file sets.
Apache License 2.0
32 stars 5 forks source link

Slow Initialization #92

Closed John-P closed 1 year ago

John-P commented 1 year ago

I've got a couple of large DICOM WSI images where calling WsiDicom.open is VERY slow. I've tracked it down to read_dataset in the init which then calls filereader.py:41(data_element_iterator) then filereader.py:461(read_sequence). This file has nearly 180,000 frames, which could be why this is so slow. However, there must surely be a faster way to initialize.

I am on wsidicom 0.4.0, pydicom 2.3.1 and python 3.11.

John-P commented 1 year ago

Here is a relevant extract of the cProfile stats:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
        1    0.000    0.000   93.116   93.116 /home/john/.../site-packages/wsidicom/instance.py:1144(__init__)
 536157/5    3.507    0.000   90.216   18.043 /home/john/.../site-packages/pydicom/filereader.py:358(read_dataset)
        1    0.000    0.000   90.216   90.216 /home/john/.../site-packages/pydicom/filereader.py:738(read_partial)
3038256/87   12.119    0.000   90.216    1.037 /home/john/.../site-packages/pydicom/filereader.py:41(data_element_generator)
357441/10    1.650    0.000   90.212    9.021 /home/john/.../site-packages/pydicom/filereader.py:461(read_sequence)
893593/178732    4.948    0.000   89.188    0.000 /home/john/.../site-packages/pydicom/filereader.py:497(read_sequence_item)
erikogabrielsson commented 1 year ago

Hi @John-P and thanks for the report.

DICOM WSI files can be formatted in a couple of different ways, and some of them are unfortunately slow. The properties that can differ are:

So given your issue with slow initialization, Im guessing that your files are sparsely tiled, and possible without offset table.

Im working (#93) on changing to lazy loading of the frame positions, meaning that the PerFrameFunctionalGroupsSequence attribute and frame offsets will not be parsed until a frame from the file is requested. This makes for example opening a WSI and reading out a thumbnail much faster (as the lower levels are not loaded). Requesting a tile from the base level will however still be slow.

Further work could be to:

However, our recommendation is to convert the files to fully tiled including offset table. This is lossless and fast (once the file has been opened...)

with WsiDicom.open(path_to_folder) as slide:
    slide.save(path_to_output)
John-P commented 1 year ago

Thanks, that sounds like a decent solution for the time being.

erikogabrielsson commented 1 year ago

You can try if the recently released 0.9.0 gives faster initialization with your files.

erikogabrielsson commented 1 year ago

@John-P did 0.9.0 give faster initialization our is this still a problem?