Closed fulmicoton closed 3 years ago
Compressing blocks of data, where the compressed block would either allow random access or not allow random access.
Without random access compressed blocks, this would be ok performance wise as long as access is on continuous data, e.g. last 15 Minutes data sorted by time without filters.
Our access pattern is not sequential but it is monotonic.
We ff_reader.get(doc)
for increasing values of doc.
Our time intensive queries are the one for which we have many calls, which mechanically implies that the doc
in these calls are close one to each other.
Note that lucene relaxed its constraint on random access. It unlocked the possibility for them to efficiently allow for sparse docvalues. (LUCENE-7407) They have an interest report on the performance impact.
This is also a valid approach for us. In the long term we might want to support different encoding scheme depending on the field (dictionary encoding, for instance). You can spend a bit of research and brain time on the broader subject of what our fast fields should look like in the long term.
Right now fast fields are only represented using bitpacking.
It would be great to improve their representation.
An important target would be sorted u64 fields, like:
There are different ways allowing for random access: