quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
https://quickwit.io
Other
8.16k stars 333 forks source link

Fix the +1 bitpacking scheme #3740

Closed fulmicoton closed 1 year ago

fulmicoton commented 1 year ago

Right now the posting lists are compressed using delta encoding + bitpacking.

In reality the documents are strictly monotonous, so the delta is always >0. We could bitpack

(doc_n+1 - doc_n - 1) instead of (doc_n+1 - doc_n)

The benefit is usually tiny. We save 128 when this change made it possible to change the bitwidth used for the block. For a block with bitwidth b, assuming a uniform distirbution (a suspicious but useful hypothesis), we have roughly speaking 1 / 2^(b - 1) to gain 128 bits.

(8 + 128(b-1))/(8 + 128b)

So we have 1/ 2^(b-1) chances to improve our compression rate by a factor of 16 / (1 + 16*b).

This should not hurt performance too much provided we do this when we have the info in a bitpacking register.

Currently we have two unused bits in the bitwidths that we could use to ensure backward compatibility. bit 7 could for instance encode whether we want this scheme or not.

The implementation is trickier than it may seem unfortunately. (In particular what should be the initial value we use)

trinity-1686a commented 1 year ago

will be fixed next time we pull tantivy