Closed karelbilek closed 4 months ago
@karelbilek Yes, I've seen it. It is not trivial. In short it searches for likely block start positions and decodes "optimistically", with handling of backfills of missing history - and handling of false block detection.
I don't have any current plans to implement it, though it could be a fun challenge, I don't see much "big file/long stream" use of gzip.
Also, there is bgzf for doing parallel gzip decompression with an index.
Thanks for a quick reply. Understandable.
I am now looking at this code, and it seems they somehow managed to parallelize gzip decompression
https://github.com/mxmlnkn/rapidgzip
https://arxiv.org/abs/2308.08955
I see crazy speedup when I use it. It's in C++ (the python stuff is just sprinkle on top; the actual code is all C++).
Unfortunately, my zero knowledge of decompression algorithms prevents me from porting it to go, and my zero knowledge of the python/cmake stuff prevents me from just "slap it" onto go code with cgo :D