Closed Keats closed 2 years ago
Elias–Fano encoding does not store enough information to support indexed access to the missing values efficiently. The SDSL equivalent of select_zero()
is basically binary search with select()
to find the right run of 0s and the total number of 0s before that run. You can optimize it a bit by precomputing the results for the first few levels of binary search. And once the range is narrow enough, using select_iter()
instead of select()
and continuing with a sequential scan will be faster.
I'll probably implement this eventually, but determining good parameters for the optimizations will take some time.
Amazing, thanks!
Hello there,
First thanks for the very nice library!
I see in the README on Planned Functionality:
and in the docs:
Our usecase is the following: we have big bitvectors (7bn+ bits) and some of them are very dense (think 99+%). We already use SparseVector when they are sparse and we currently use https://github.com/ajalab/fid when they are not. It works ok but FID is 10-30x slower to build than simple-sds bitvectors. One solution would be to use an inverted sparse vector but we would need to be able to select the real 1s on it. Does it make sense? I could use a normal
BitVector
but it would end up being much bigger than aSparseVector
.