Open ecc521 opened 3 years ago
If you install the indexed-gzip package, you should get performance improvements for free.
Thanks! @effigies
Looking at the indexed-gzip docs, I was able to find the flag - keep_file_open = True
While indexed-gzip alone does help, that's all that is needed for this use case.
Still confused as to why keep_file_open is off by default, but enabling it seems to be a solution.
Unless the defaults or relevant documentation (see slicer section - no mention) needs to be revisited to make keep_file_open/indexed-gzip more visible, I'm good to close this.
With indexed gzip, you should not need to set keep file open to get almost identical performance.
The reason it's off by default is that, when working with many files, you can exhaust file handle quotas, and the lifetimes of file handles are difficult to reason about.
Definitely good to update the docs.
I have some rather large (gzipped) NIFTI files I need to read without first buffering in memory (so reading in slices).
When loading the image via nibabel.load, slicer and dataobj appear to be re-reading from the beginning each time, resulting in quadratic time complexity with the number of slices taken.
Since I'm only interested in proceeding forward through the file, it would seem that the time complexity here should be linear - indeed, linear time complexity can be obtained by updating the code from an old question:
In this case, since GzipFile preserves the current decompression state, proceeding strictly forward in the file works at the expected speed (much faster).
Is there a way to obtain this same slicing performance using the nibabel.load() API (such as by passing a GzipFile, etc)? This would be greatly preferable, as it abstracts away file formats.