mxmlnkn / rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines
Apache License 2.0
344 stars 7 forks source link

Parallelize marker replacement #4

Closed mxmlnkn closed 1 year ago

mxmlnkn commented 1 year ago

The decoding works in two steps:

  1. Decode with a bogus backreference buffer initialized to 16-bit indexes.
  2. Replace those 16-bit indexes (markers) with the actual backreference contents.

Currently, the second step is done on the orchestrator thread. This might limit performance. Marker replacement yields benchmark results of 12 GB/s and compacting the buffers from 16-bit storage type that only contains 8-bit values takes 4 GB/s.

All in all, this slowly becomes an academic/high-performance computing issue not one of general ratarmount/pragzip usage but it would still be nice to have.

mxmlnkn commented 1 year ago

Implemented with https://github.com/mxmlnkn/indexed_bzip2/commit/6cb4ab6b79eabbc3fa67a7edbc0c327f5e794b74