vmware-archive / quickstep

Quickstep Project
Apache License 2.0
27 stars 13 forks source link

Groupby hashtable pool #236

Closed hbdeshmukh closed 8 years ago

hbdeshmukh commented 8 years ago

This PR creates a pooling mechanism for group by hash tables. Earlier, all the worker threads involved in the aggregation phase shared a hash table per aggregation handle. This caused degradation in the performance when the number of groups in the output are low.

In the new mechanism, each thread can work on a private hash table and we finally merge the individual hash tables in the pool.

The performance is pretty much improved for all the SSB queries and for queries which don't see any improvement, the degradation is < 10%. TPC-H Q1 SF100 saw a 4x performance improvement with these changes (based on @pateljm 's evaluation for TPC-H).

pateljm commented 8 years ago

We should in a separate PR allow the optimizer to turn on/off this option. Nice work @hbdeshmukh!

I'd personally love to see a few more comments around the HashTableMerger class (ok -- you knew I was going to say that :-) But, ok to do that in a separate PR.

It it looks good to @jianqiao, we should close.

hbdeshmukh commented 8 years ago

Killed the travis build, as GCC didn't like one of the DCHECKs.

hbdeshmukh commented 8 years ago

@jianqiao Based on our discussion, I am merging this PR as some of us are waiting for the merge in order to run the experiments. Thanks for your review comments!

jianqiao commented 8 years ago

LGTM! Sorry that I was out for a while and didn't merge in time..