Currently variable-length encoding is used by default for all data blocks (sortedIndex, hashIndex, bloomFilter etc)
Issue
All reads will read the byte[] into heap (cached if configured) first and then iterate over the read byte[] to decode the primitive value.
Solution
Off-heap ByteBuffer (#284) allocations will allows us to read primitives (Int, Long & Byte) directly from memory. This would reduce heap allocations.
Drawback
This increases the cost of storage and required cache size) since ByteBuffer and Unsafe APIs do not provide variable-length encoding & decoding so it should be configurable for each data-block so we can choose between performance or storage savings or have a balanced tradeoff by enable varints for some blocks vs others.
Current behaviour
Currently variable-length encoding is used by default for all data blocks (sortedIndex, hashIndex, bloomFilter etc)
Issue
All reads will read the
byte[]
into heap (cached if configured) first and then iterate over the readbyte[]
to decode the primitive value.Solution
Off-heap
ByteBuffer
(#284) allocations will allows us to read primitives (Int
,Long
&Byte
) directly from memory. This would reduce heap allocations.Drawback
This increases the cost of storage and required cache size) since
ByteBuffer
andUnsafe
APIs do not provide variable-length encoding & decoding so it should be configurable for each data-block so we can choose between performance or storage savings or have a balanced tradeoff by enable varints for some blocks vs others.