Closed PSeitz closed 3 years ago
bit block overhead: pointer to block (u32 or u64), num bits u8 -> total cost between 5bytes (serialization in bytes to avoid padding) and 8bytes (7byte pointer or (u32, u8) with padding)
the bigger the block the higher the risk of outliers increasing bit_width
the smaller the block the more relative overhead for metadata
Memory Usage down from 44Mb to 33Mb on a timestamp fastfield during serialization
Performance looks good too
0..21500
test tests::bench_blockbitp_create ... bench: 104,929 ns/iter (+/- 3,591)
test tests::bench_blockedbitp_read ... bench: 68,370 ns/iter (+/- 1,637)
test tests::bench_vint_create ... bench: 137,218 ns/iter (+/- 8,325)
test tests::bench_vint_read ... bench: 88,370 ns/iter (+/- 6,387)
0..21500 val*val encoded
test tests::bench_blockbitp_create ... bench: 140,889 ns/iter (+/- 9,809)
test tests::bench_blockedbitp_read ... bench: 85,434 ns/iter (+/- 9,375)
test tests::bench_vint_create ... bench: 159,304 ns/iter (+/- 7,030)
test tests::bench_vint_read ... bench: 118,647 ns/iter (+/- 6,524)
With base_value and offset to base_value decoding memory usage is down to 2,6MB in the timestamp fast field case.
Performance is now similar to VInt
Awesome!
128 elements emerged as optimal block size.
Several tests have been executed for 32, 64, 128, 256, 512. With regards to compression ratio and speed 128 was the clear overall winner.
Writers are used for indices to cache data before serialization. Currently the data is mostly vint encoded to reduce memory usage. With index sorting #1014 having compression with random access could be beneficial on serialization.
One solution could be per block-bitpacking with some kind of index for instance.
Open Questions