Potential optimisations for query evaluation

desmondcheongzx commented 3 years ago

Currently we perform unions and intersections over individual records. This came as a result of needing to filter records by metrics/timestamps at the leaf level. However, this is potentially costly when we're still filtering results by labelKey and labelValue pairs.

Instead, our ResultSet could have two additional fields: a bitset field containing the relevant series in a roaring bitmap, and unpacked boolean, denoting whether the roaring bitmap has been unpacked into the vector of records.

Two ResultSets that haven't been unpacked can be unioned/intersected on their bitsets alone. When a ResultSet has been unpacked, we must unpack any other ResultSet that it is unioned/intersected with. Finally, before returning our results, we must ensure that the ResultSet has been unpacked.

n-young commented 3 years ago

Also, could apply the entire conditional to each series rather than iterating many times over

desmondcheongzx commented 3 years ago

Right, so there should be a third field called filters that would store a vector of lambda functions to apply over the data points. When evaluating a variable + metric value predicate, we get the bitmap of relevant series plus the filter to apply.

We can delay applying this conditional as long as the resultSet is involved in AND operations. Once there's an OR operation, we have no choice but to unpack both conditions.

desmondcheongzx commented 3 years ago

It's hard to evaluate without a proper benchmark, but testing query evaluation on larger data sets is still very very slow even with #13

n-young / trustdb

Potential optimisations for query evaluation #5