This PR adds a run-length encoded bitvector similar to the one used in RLCSA. The vector is encoded as a sequence of maximal runs of unset and set bits.
If there are n0 unset bits followed by n1 set bits, the run is encoded as a pair of integers (n0, n1 - 1). The integers are encoded using 4-bit code units with a 3+1-bit encoding. The encoded runs are concatenated to form 32-byte blocks, which are padded with 0s when necessary. For each block, we store a sample (rank(i, 1), i), where i is the number of bits encoded up to that block. There are also second-level indexes for narrowing down the range of samples for each query type.
This PR adds a run-length encoded bitvector similar to the one used in RLCSA. The vector is encoded as a sequence of maximal runs of unset and set bits.
If there are
n0
unset bits followed byn1
set bits, the run is encoded as a pair of integers(n0, n1 - 1)
. The integers are encoded using 4-bit code units with a 3+1-bit encoding. The encoded runs are concatenated to form 32-byte blocks, which are padded with 0s when necessary. For each block, we store a sample(rank(i, 1), i)
, wherei
is the number of bits encoded up to that block. There are also second-level indexes for narrowing down the range of samples for each query type.