quickwit-oss / bitpacking

SIMD algorithms for integer compression via bitpacking. This crate is a port of a C library called simdcomp.
MIT License
272 stars 30 forks source link

BitPacker4x::num_bits_sorted SIGSEGV on empty arrays #23

Closed Kerollmops closed 4 years ago

Kerollmops commented 4 years ago

When called with an empty array the num_bits_sorted triggers a segfault, this is not specified anywhere that the array must not be empty (or do I missed it?).

fn num_bits_sorted(&self, initial: u32, decompressed: &[u32]) -> u8 {
     unsafe { scalar::UnsafeBitPackerImpl::num_bits_sorted(initial, decompressed) }
}

If you want to easily reproduce the bug just create an empty vec![] and give it to BitPacker4x::num_bits_sorted.

fulmicoton commented 4 years ago

Good point! Let's fix this.

fulmicoton commented 4 years ago

@kerollmops also, if you intend to use bitpacking for meili, make sure to bitpack on blocks with a limit size. (typically 128 ints.)

Kerollmops commented 4 years ago

@fulmicoton, what do you mean by a block with a limit size?

fulmicoton commented 4 years ago

What I meant is : you do not want to compute the bit size for your entire postlist and stick to it. The reason is that the compression rate will then be determined by your largest delta. If your posting list has millions of elements you will probably end up having on outlier ruin the entire compression.

Kerollmops commented 4 years ago

Ok so the num_bits_sorted kind of silently ignore the rest of the slice, right? So what I have to do is to store the computed num_bits for each block along with the block?

It makes perfect sense to me, thank you!

fulmicoton commented 4 years ago

Yes. I'll update doc and add an assert.

Your first block required numbits=10 and the second one numbits=15. The second block was ignored, so only 10 bits were used. The first delta for which it was insufficient happened to be exactly 2^10=1024 but that was a coincidence.

Kerollmops commented 4 years ago

I have computed and added the num_bits values in front of all the numbers blocks. Thank you! This is fixed now!

(Whooops, the SIGSEGV is not fixed, I reopened)