Evaluate/Implement block-bitpacking datastructure for Writers

quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

MIT License

12.03k stars 670 forks source link

Evaluate/Implement block-bitpacking datastructure for Writers #1027

Closed PSeitz closed 3 years ago

PSeitz commented 3 years ago

Writers are used for indices to cache data before serialization. Currently the data is mostly vint encoded to reduce memory usage. With index sorting #1014 having compression with random access could be beneficial on serialization.

One solution could be per block-bitpacking with some kind of index for instance.

[x] evaluate the memory impact of block-bitpacking.
[x] implement block-bitpacking.

Open Questions

Ideal Blocksize
apply additional compression inside block (e.g. offset per block or linear interpolation) for some data?

PSeitz commented 3 years ago

bit block overhead: pointer to block (u32 or u64), num bits u8 -> total cost between 5bytes (serialization in bytes to avoid padding) and 8bytes (7byte pointer or (u32, u8) with padding)

the bigger the block the higher the risk of outliers increasing bit_width

the smaller the block the more relative overhead for metadata

PSeitz commented 3 years ago

Memory Usage down from 44Mb to 33Mb on a timestamp fastfield during serialization

PSeitz commented 3 years ago

Performance looks good too

0..21500

test tests::bench_blockbitp_create ... bench:     104,929 ns/iter (+/- 3,591)
test tests::bench_blockedbitp_read ... bench:      68,370 ns/iter (+/- 1,637)
test tests::bench_vint_create      ... bench:     137,218 ns/iter (+/- 8,325)
test tests::bench_vint_read        ... bench:      88,370 ns/iter (+/- 6,387)

0..21500 val*val encoded

test tests::bench_blockbitp_create ... bench:     140,889 ns/iter (+/- 9,809)
test tests::bench_blockedbitp_read ... bench:      85,434 ns/iter (+/- 9,375)
test tests::bench_vint_create      ... bench:     159,304 ns/iter (+/- 7,030)
test tests::bench_vint_read        ... bench:     118,647 ns/iter (+/- 6,524)

PSeitz commented 3 years ago

With base_value and offset to base_value decoding memory usage is down to 2,6MB in the timestamp fast field case.

Performance is now similar to VInt

fulmicoton commented 3 years ago

Awesome!

PSeitz commented 3 years ago

128 elements emerged as optimal block size.

Several tests have been executed for 32, 64, 128, 256, 512. With regards to compression ratio and speed 128 was the clear overall winner.