logv / sybil

columnar storage + NoSQL OLAP engine | https://logv.org
https://logv.org
Other
305 stars 26 forks source link

Only Allocate Records with the number of columns requested in the query #116

Closed okayzed closed 4 years ago

okayzed commented 4 years ago

Right now, a Record has .Ints and .Strs allocated as arrays of size max_key_id. This is suboptimal and slow

Need to figure out a way to only allocate the number of requested columns for the record.

okayzed commented 4 years ago

The first attempt is a bit hacky, but this significantly speeds up queries that ask for a subset of columns!

5b9eab2

When I try this on jaeger spans from redbull, it dropped the query time of 138mm records from 36s to 12s when asking for a single column. Asking for two int columns takes 18 - 20s, but if a group by is added, it can significantly slow the query down.

I'm super excited about this performance and I find it quite surprising - I think it has to do with lowering the 1) record reset time and 2) improving cache locality.

okayzed commented 4 years ago

https://github.com/logv/sybil/tree/shorten_key_table has current work. in testing but will be merged into master after its ready