Closed NicoKiaru closed 1 year ago
All the time is spent in NIOByteBufferProvider.allocate
Was that screenshot of the profile for a smaller file which completed or just a sub section of the init time for the larger file? We would probably need to do some thorough profiling to confirm if that is the only bottleneck.
At the minute the Bio-Formats readers saves the startPosition for each segment and when reading each will open a stream and seek to that position, it may be possible to avoid a lot of that seeking but I would need to check to confirm and it is probably not a small change.
Hello @dgault ,
Was that screenshot of the profile for a smaller file which completed or just a sub section of the init time for the larger file?
The profiling was made for the sub-section of the init time of a large file, which did not complete. Happy to help if possible. I have a gigantic Tb file, but I can make a manageable subset of it and share it. I also have a branch with a few logs here and there to test the opening speed.
I think the problem comes from the reader which reads all blocks of the file linearly. When going through Tb size files, it's not efficient. I'm digging through the current reader implementation and through the specs in https://zeiss.github.io/libczi/index.html, and I'll try to come with a more efficient indexing by using the informations located in the sub-block directory. That's probably going to be painful given the size and use cases for this reader, but I'll give it a shot.
Thanks @NicoKiaru, if you think you have a potential solution feel free to open the PR and we can get it tested against the existing datasets that we have
Very complicated this reader... Pretty sure I won't be able to do something as general as what it currently supports.
Do you think one of you could details a little bit the logic behind the openBytes
method ? I really have a hard time understanding what's happening.
Most of the scary-looking logic in openBytes
is for handling whole slide data and/or the results of tile stitching. In these cases there will be a bunch of "extra" subblocks that openBytes
shouldn't care about - the subblocks stored in the files are not just the pixel data tiles alone, there will be additional ones that define (but don't have pixel data for) each pyramid resolution etc. openBytes
is also only reading pixel data from subblocks that represent tiles within the requested region, but for whole slide/tiled data the number of tiles to read isn't predictable for a fixed area. Tiles often overlap unpredictably, which is another part of why openBytes
iterates over every SubBlock
.
All of this obviously varies quite a bit based on the imaging modality and actual acquisition settings. You'll note that in various points throughout the reader, there are some special cases for specific imaging types (isPALM
, scanDim
/validScanDim
for some tiled data, etc.). If there is a reliable way to detect lightsheet data from the metadata, a similar special case in initFile
/openBytes
might make things easier for now.
Quick update on this issue: there is (will be) an alternative reader which should parse the metadata faster. Its implementation is in https://github.com/BIOP/quick-start-czi-reader . It will be added as an external reader as soon as this PR is accepted.
Long term goal: if the new reader works well and can open the czi file as well and in similar enough way than the old reader, the new one may replace the old one.
I do not think it's necessary to keep this issue opened, see message above
Hello everybody,
We acquired a LLS system recently from Zeiss and a CZI File can go up to a Tb without too much difficulties. (for instance 2 channels, 1024 x 450 pixels, 2000 slices, 300 timepoints).
It's big, but normally these two gigantic stacks which can be opened virtually without too much issues with ImageJ.
The problem comes from the initialisation of the ZeissCZIReader, it just takes forever. I don't know how long it took for a Tb file. The only thing I know is it is less thant a night, because I opened the file in the evening and it was opened the next morning...
I started to monitor the performance of the reader, and the first bottleneck is in readSegments(id). Maybe it is the only bottleneck, because it was not patient enough to let it finish. A quick estimation of the
readsegments
method execution time for my Tb file is ~ 3 hours.Do you know if there's a way to avoid this long step at the beginning ? Do you know how the reader by Zen works ? Is there a possibility to mimick it ?
ping @sebi06 @dgault
Opening the file on Zen takes around 5 seconds.