mxmlnkn / rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines
Apache License 2.0
345 stars 7 forks source link

Create seek points based on decompressed size instead of compressed size #3

Closed mxmlnkn closed 1 year ago

mxmlnkn commented 1 year ago

This is almost a requirement to enable pragzip for index creation in ratarmount instead of just index-reading. indexed_gzip already does something like this I think.

I think this would be easier to implement than #2 and therefore would be a good start. For starters, we could simply add all block boundaries to the Result type of the decoding routines. This would be a simple list of compressed/uncompressed offsets. The windows could be extracted from decompressed data, which is returned anyway. Then, the orchestrator thread can use all those seek points to find ideal ones and store them in the index database.

mxmlnkn commented 1 year ago

Implemented with https://github.com/mxmlnkn/indexed_bzip2/commit/9506c2d1c59fc526615d8345f0a9d3f232e9276e