simongog / sdsl-lite

Succinct Data Structure Library 2.0
Other
2.22k stars 349 forks source link

Or and And operations for bitvectors rrr_vector or sd_vector #329

Open abhijitiitr opened 8 years ago

abhijitiitr commented 8 years ago

This is a theoretical question related to bitwise operations on compressed bitvectors. Lucene and other search engines use compressed bitmaps or roaring bitmaps to compute the OR between two representations of bitvectors, while also exploiting SIMD instructions for computing the results.

Is there a similar mechanism in the library to intersect or compute the unions of compressed bitvectors rrr_vector and sd_vector? If not, can it be done? Can you direct me towards the academic literature concerning the bitwise operations on succinct bitvectors.

jltsiren commented 8 years ago

I'm not aware of any literature on the topic.

With rrr_vector, the best idea is probably to decompress the bitvector one block at a time, do the operations with the decompressed blocks, and then compress the results if needed. Decompressing a block takes a similar amount of effort as extracting a single bit. rrr_vector::get_int() already implements something like that.

The best you can do with an sd_vector is an interator that lists the positions containing 1-bits. A select(i) query is implemented as

low[i - 1] + ((high_1_select(i) + 1 - i) << wl)

The lower wl bits of the positions are stored in array low, while the differences between the integers represented by the high-order bits are encoded in unary in uncompressed bitvector high. Because bitvector high is guaranteed to be dense, you can just count the number of 0-bits between successive 1-bits in high to update the high-order bits for the next position.

abhijitiitr commented 8 years ago

Thanks for your response @jltsiren