tspurway / hustle

A column oriented, embarrassingly distributed relational event database.
Other
240 stars 36 forks source link

Combine optimization in restrict-select stage #21

Closed tspurway closed 10 years ago

tspurway commented 10 years ago

For aggregation functions, we currently don't take advantage of duplicate group by keys during the first stage of the query. It is possible to 'combine' the keys by storing the aggregated keys in memory during the input processing to the restrict-select stage. Note that this should only be done for low cardinality keys, as the higher the cardinality, the more memory will be used as well as the fact that high cardinality keys don't benefit from a combiner at all.

We should also take a look at whether or not the separate 'group-combine' stage is really required any more. Benchmark this with both high and low cardinality group keys to see if it in fact saves any time.