zommiommy commented 3 years ago

Hi, I'm currently using your library and I noticed a particular case that raised the panic:

thread 'test_reference' panicked at 'assertion failed: k <= n', src/enum_code.rs:13:5

the assertion debug_assert!(k <= n); is reasonable so the problem must be somewhere else. Looking through the code it seems that the problem might be when a block is composed of all ones. I suppose that the problem might be in the functions write_block and encode_block.

I am happy to help fix this issue.

Tests

The smallest test with which I was able to reproduce it is:

#[test]
// THIS FAIL
fn test_reference() {
    let mut r = rsdict::RsDict::new();
    for _ in 0..65 {
        r.push(true);
    }
}

Note that if we change 65 to 64 the test pass.

#[test]
// THIS PASS
fn test_reference() {
    let mut r = rsdict::RsDict::new();
    for _ in 0..64 {
        r.push(true);
    }
}

zommiommy commented 3 years ago

I have checked the values of n and k that causes the crash: k: 64 n: 63.

From what I understand k == class which should be the number of ones in the block, therefore the problem might be at line 35 of src/enum_code.rs:

let n = (SMALL_BLOCK_SIZE as u8 - i) - 1;

sujayakar commented 3 years ago

Hi @zommiommy, thanks for the report and excellent minimal reproduction!

I went over my notes from the original paper, and the bug is actually quite interesting :)

The main idea for the code is to take a bitset of length n and k bits set and map it to an integer. There are n choose k different input values, so we want the range of our code to be in [0, B(n, k)) (where B is binomial_coefficient).

Let's define this by inducting on n. If n is zero, then the coded bitset is just zero. Otherwise, separate the bitset into its first bit, the "head," and the remaining n - 1 bits, the "tail." If the head is set, the tail has k - 1 bits, so it codes to an integer in [0, B(n - 1, k - 1)) bits. Otherwise, the tail has k bits set, so it codes to [0, B(n - 1, k)) bits.

Then, we can define our final code by laying out the zero case first from [0, B(n - 1, k)) bits and the one case afterwards from [B(n - 1, k), B(n - 1, k) + B(n - 1, k - 1)).

[------ B(n - 1, k) -----][------ B(n - 1, k - 1) ------ ]
      nth bit is zero             nth bit is one

This layout is the explanation for this block within the loop, which adds B(n - 1, k) to the result if the bit is set.

if (value >> i) & 1 != 0 {
    let n = (SMALL_BLOCK_SIZE as u8 - i) - 1;
    code += binomial_coefficient(n, k);
    k -= 1;
}

The error here, however, is that both inductive cases aren't always available to us! When we have n == k, it's not possible to have the first bit zero, since all the bits must be set. Therefore, when the bits are all set, the correct inductive "layout" is just a single case:

[------ B(n - 1, k - 1) ------ ]
        nth bit is one

Similarly, if k == 0, the same collapse happens in the opposite direction.

[------ B(n - 1, k) -----]
      nth bit is zero

So, the fix is just adding a conditional to our block:

if n > k && (value >> i) & 1 != 0 {
    let n = (SMALL_BLOCK_SIZE as u8 - i) - 1;
    code += binomial_coefficient(n, k);
    k -= 1;
}

I'm actually traveling tomorrow but will make this fix and add appropriate tests in the next few days. This is a great example where over-reliance on tools like quickcheck can bite!

sujayakar commented 3 years ago

Just kidding, instead of packing I fixed this instead :)

I just uploaded version "0.0.4" which has the fix. By the way, if you're using this for https://github.com/zommiommy/elias_fano_rust, be sure to set the "simd" feature on and RUSTFLAGS="-C target-cpu=native" for maximum performance on rank!

I haven't tried SIMD accelerating select yet, but it could be done. If you have a specific workload in mind, I'd be interested in taking a crack at it.

zommiommy commented 3 years ago

Thanks a lot! I'm gonna benchmark it today!

Yep, I'm using it for the Elias Fano implementation, where basically we have a bit-vector where we save in 'unary' the higher bits of the values. This vector has a lot of nice properties such as that it has half ones and half zeros. Moreover, unfortunately, to implement rank and select on this structure we only need the select on the higher bits. So we are particularly interested in the performances of select on "dense" bit-vectors. And if we can compress it's a plus.

We are implementing this for ensmallen which is our library to do fast random walks on compressed graphs.

What's the best way to enable: RUSTFLAGS="-C target-cpu=native"? In our case, we are building Python bindings using PyO3 and maturin so I don't really know how to enable these flags.

By the way, in our experiments, we found that -C inline-threshold=10000000 (as suggested by https://twitter.com/whitequark ) yields a 5% improvement of benchmarks on our test case. So you might want to check it out.

Thanks again!

sujayakar commented 3 years ago

What's the best way to enable: RUSTFLAGS="-C target-cpu=native"?

sorry I missed this! Just setting the environment variable should be sufficient. (I think https://github.com/LucaCappelletti94/ensmallen_graph/tree/master/bindings/python#exploiting-avx should work, for example.)

sujayakar / rsdict

Binomail Coefficent assert fails if all ones in block #1

Tests