jltsiren / simple-sds

Simple succinct data structures (in Rust)
MIT License
45 stars 7 forks source link

Smaller SparseVector #6

Closed jltsiren closed 3 years ago

jltsiren commented 3 years ago

Like SDSL, simple-sds used to create 1 << log2(m) buckets for the high parts of the values in SparseVector. This often created unnecessary buckets for positions that did not exist in the bitvector. After this PR, there will only be buckets for positions 0 (inclusive) to n (exclusive), which may reduce the size of the structure significantly.

The width of the low parts also used a poor approximation inherited from SDSL: ceil(log2(n + 1)) - ceil(log2(m + 1)). Now the implementation rounds the optimal value log2((n * ln(2)) / m) to the nearest integer.