Closed John-P closed 1 year ago
Here is a relevant extract of the cProfile stats:
ncalls tottime percall cumtime percall filename:lineno(function)
...
1 0.000 0.000 93.116 93.116 /home/john/.../site-packages/wsidicom/instance.py:1144(__init__)
536157/5 3.507 0.000 90.216 18.043 /home/john/.../site-packages/pydicom/filereader.py:358(read_dataset)
1 0.000 0.000 90.216 90.216 /home/john/.../site-packages/pydicom/filereader.py:738(read_partial)
3038256/87 12.119 0.000 90.216 1.037 /home/john/.../site-packages/pydicom/filereader.py:41(data_element_generator)
357441/10 1.650 0.000 90.212 9.021 /home/john/.../site-packages/pydicom/filereader.py:461(read_sequence)
893593/178732 4.948 0.000 89.188 0.000 /home/john/.../site-packages/pydicom/filereader.py:497(read_sequence_item)
Hi @John-P and thanks for the report.
DICOM WSI files can be formatted in a couple of different ways, and some of them are unfortunately slow. The properties that can differ are:
DimensionOrganizationType
attribute in the dataset, and the value is TILED_FULL
it is fully tiled and much faster to parse. If there is no DimensionOrganizationType
attribute or if it is TILED_SPARSE
it is sparsely tiled. That means that the position of each frame is given in the PerFrameFunctionalGroupsSequence
attribute, which is slow to parse.ExtendedOffsetTable
attribute. If there is not offset table the whole PixelData element must be parsed to determine where each frame starts, which is slow.So given your issue with slow initialization, Im guessing that your files are sparsely tiled, and possible without offset table.
Im working (#93) on changing to lazy loading of the frame positions, meaning that the PerFrameFunctionalGroupsSequence
attribute and frame offsets will not be parsed until a frame from the file is requested. This makes for example opening a WSI and reading out a thumbnail much faster (as the lower levels are not loaded). Requesting a tile from the base level will however still be slow.
Further work could be to:
PerFrameFunctionalGroupsSequence
into memory until needed.PerFrameFunctionalGroupsSequence
However, our recommendation is to convert the files to fully tiled including offset table. This is lossless and fast (once the file has been opened...)
with WsiDicom.open(path_to_folder) as slide:
slide.save(path_to_output)
Thanks, that sounds like a decent solution for the time being.
You can try if the recently released 0.9.0 gives faster initialization with your files.
@John-P did 0.9.0 give faster initialization our is this still a problem?
I've got a couple of large DICOM WSI images where calling
WsiDicom.open
is VERY slow. I've tracked it down toread_dataset
in the init which then callsfilereader.py:41(data_element_iterator)
thenfilereader.py:461(read_sequence)
. This file has nearly 180,000 frames, which could be why this is so slow. However, there must surely be a faster way to initialize.I am on wsidicom 0.4.0, pydicom 2.3.1 and python 3.11.