quickwit-oss / bitpacking

SIMD algorithms for integer compression via bitpacking. This crate is a port of a C library called simdcomp.
MIT License
267 stars 30 forks source link

BitPacker4x en/decoding problem #24

Closed Kerollmops closed 4 years ago

Kerollmops commented 4 years ago

I have made a playground test where I show what I have done so far to en/decode a list of ordered u32s, the list of integers I use in this example are not en/decoded correctly it seems like there is a difference of exactly 1024 (2^10) between the original numbers 19984 and 21086 (which changed into 20062).

Where do you think I could have made a mistake, is it related to https://github.com/tantivy-search/bitpacking/issues/23#issuecomment-621728292 ?

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a11e9600e3c3f8e1d4420527668673d2

(Unfortunately bitpacking and zerocopy are not supported by the playground you will have to copy/paste it, sorry for the inconvenience)

fulmicoton commented 4 years ago

Can you simplify the code to bring it down to one bitpacked block?

Kerollmops commented 4 years ago

Here I extracted the values that didn't work for me. The problem is always located between 19984 and 21086 (which becomes 20062), the diff is exactly 1024 (2^10) and this is the value use for num_bits.

fn encode_decode_aie() {
    let slice = &[18655, 18656, 18692, 18719, 18766, 19226, 19504, 19674, 19892, 19958, 19984, 21086, 21187, 21324, 21325, 21383, 21896, 21911, 21912, 22094, 22170, 22334, 23006, 23192, 23193, 23492, 23598, 23742, 23801, 24015, 24092, 24116, 24117, 24449, 24771, 24778, 24787, 24788, 24789, 25409, 25836, 25853, 27088, 27111, 27170, 27811, 28027, 28103, 28104, 28105, 28106, 28108, 28111, 28114, 28115, 28116, 28118, 28119, 28120, 28121, 28261, 28364, 28485, 29205, 29373, 29435, 29436, 29719, 29943, 29944, 31207, 31519, 31593, 31595, 31908, 32524, 32777, 33312, 33653, 34425, 34608, 34609, 35121, 35123, 35125, 35585, 35588, 35793, 36335, 36336, 36935, 36966, 37105, 37171, 37172, 37173, 37305, 37534, 37768, 37769, 37942, 38077, 38078, 38277, 38278, 38514, 38919, 39055, 39684, 39795, 40058, 41314, 41351, 41352, 41353, 41354, 41567, 41814, 42473, 43017, 43112, 43303, 43442, 43727, 44062, 44063, 44201, 44311][..];
    let initial_value = 18647;
    let num_bits = 10;

    let bitpacker = BitPacker4x::new();

    let mut buffer = vec![0u8; 4 * BitPacker4x::BLOCK_LEN];
    let compressed_len = bitpacker.compress_sorted(initial_value, slice, &mut buffer, num_bits);

    let mut decompressed = vec![0; BitPacker4x::BLOCK_LEN];
    bitpacker.decompress_sorted(initial_value, &buffer[..compressed_len], &mut decompressed, num_bits);

    assert_eq!(slice, decompressed.as_slice());
}
fulmicoton commented 4 years ago

It is related to issue #23. num_bits expects a slice that is exactly of size BLOCK_LEN