klauspost / pgzip

Go parallel gzip (de)compression
MIT License
1.12k stars 77 forks source link

rapidgzip #58

Closed karelbilek closed 4 months ago

karelbilek commented 4 months ago

I am now looking at this code, and it seems they somehow managed to parallelize gzip decompression

https://github.com/mxmlnkn/rapidgzip

https://arxiv.org/abs/2308.08955

I see crazy speedup when I use it. It's in C++ (the python stuff is just sprinkle on top; the actual code is all C++).

Unfortunately, my zero knowledge of decompression algorithms prevents me from porting it to go, and my zero knowledge of the python/cmake stuff prevents me from just "slap it" onto go code with cgo :D

klauspost commented 4 months ago

@karelbilek Yes, I've seen it. It is not trivial. In short it searches for likely block start positions and decodes "optimistically", with handling of backfills of missing history - and handling of false block detection.

I don't have any current plans to implement it, though it could be a fun challenge, I don't see much "big file/long stream" use of gzip.

klauspost commented 4 months ago

Also, there is bgzf for doing parallel gzip decompression with an index.

karelbilek commented 4 months ago

Thanks for a quick reply. Understandable.