mxmlnkn / ratarmount

Access large archives as a filesystem efficiently, e.g., TAR, RAR, ZIP, GZ, BZ2, XZ, ZSTD archives
MIT License
915 stars 39 forks source link

Use pread #100

Open mxmlnkn opened 1 year ago

mxmlnkn commented 1 year ago

Currently, StenciledFile and other locations use locks to avoid race conditions between seek and read calls. I'm not sure why I never though of pread before but it might improve performance for multi-threaded access and even if not, it should simplify the code by removing the lock.

mxmlnkn commented 1 year ago

One reason why os.pread is not always possible is the recursion. In those cases the reading has to be done on pure Python file-like objects for which pread cannot be used. But, it should still be possible to implement as a fast-path for real files.

mxmlnkn commented 2 months ago

This becomes even more important on my Lustre benchmarks because it advertises itself as having 4 MiB block size, which causes the Python buffered I/O reader to read 4 MiB chunks even if only 1 KiB are necessary. Using os.pread (when available, i.e., not on Windows), would also skip the buffering. The buffering probably should also be turned off independent of this change, but then for it to work reasonably fast and not cause 8 KiB accesses to Lustre, ratarmount also needs to forward the block size from the opened file!