vmware-archive / quickstep

Quickstep Project
Apache License 2.0
27 stars 13 forks source link

Change default aggregate_hashtable_type from LinearOpenAddressing to SeparateChaining #207

Closed jianqiao closed 8 years ago

jianqiao commented 8 years ago

This PR changes the default hashtable for aggregation from LinearOpenAddressingHashTable to SeparateChainingHashTable.

The reason is that the performance of LinearOpenAddressingHashTable drops radically if the future hash table size is underestimated on its initialization, while SeparateChainingHashTable is more resilient to any initial size. This situation gets reflected if we run TPC-H query 18.

The long-term solution will be to develop a more accurate estimation of the hash table size during query optimization.

pateljm commented 8 years ago

LGTM. Merging.

jianqiao commented 8 years ago

The test failures are due to use of SimpleScalarSeparateChainingHashTable for DISTINCT aggregation that involve one group-by attribute -- but the distinctify hash table should have two attributes as composite keys: one from group by and one is the argument. Will fix this problem and create a new PR.