mxmlnkn / rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines
Apache License 2.0
345 stars 7 forks source link

Use faster library for decompression when index is available #19

Closed mxmlnkn closed 11 months ago

mxmlnkn commented 1 year ago

When the index is available, we can use existing libraries, at least zlib works. zlib-ng might also work but performance improvements will be marginal according to simple tests https://github.com/mxmlnkn/rapidgzip/issues/9#issuecomment-1591995999. On the other hand, libdeflate might bring some significant performance improvements. However, it is questionable whether libdeflate works in this use case because we need to very specialized interface functions: "set window" and "inflate prime" in order to start in the middle of a byte from a deflate block directly. If something like inflatePrime is not offered, then we might have to modify libdeflate or we might be able to do some tricks like pigz does in order to byte-align separate deflate streams by inserting zero-size deflate blocks. However, our case might be harder to do. Byte-aligning is easy with empty non-compressed blocks but aligning to some subbit will require empty Fixed Huffman or Dynamic Huffman blocks.

I might even be able to use igzip although it might be troublesome to implement the NASM-dependent build process in setuptools, so I probably should only use igzip for building the wheels and not by default. The same question about how to implement the inflatePrime call as for libdeflate applies here. And a comparison benchmark would need to be added.

mxmlnkn commented 11 months ago

Implemented in 0.8.1. The NASM build process was indeed troublesome to integrate into setuptools for all three platforms.