Closed pnowojski closed 7 years ago
After spending some time optimizing InMemoryJoinHash
I see at least two more changes that could further improve join performance. One is to create specialized version of InMemoryJoinHash
for single column joins using long
java type just like it is done for aggregation - we could skip hashing this single column.
Secondly, we could think about avoiding double/triple hashing rows. Currently hashes are calculated many times all over again, using different hashing algorithms for different partitioning/bucketing in different places. Correct me if I'm wrong, but for example with simple join of two tables, rows are hashed three times:
rawHash
using HashGenerator.hashPositioni
to distribute data between nodes.rawHash
is hashed again in ParallelHashBuilder
using murmurHash3
to distribute data within node between parallel instances InMemoryJoinHash
.InMemoryJoinHash
calculates yet another hash by hashing rawHash
using XxHash64
.I Imagine, that if there is also some aggregation after joining, there is yet another hash being calculated?
Maybe it would be more efficient to calculate one larger hash (64 bit?) and divide it's entropy between operators? For example ParallelHashBuilder
could use first 6 bits of the hash, InMemoryJoinHash
next 26 bits and aggregation remaining bits? I picked those values, because 2^6 could be reasonable max value for task_hash_build_concurrency
, InMemoryJoinHash
already can not handle more then 2^26 values (limitation of fastutil library). However those values could be calculated dynamically as well.
CC @losipiuk @kokosing @dain
While optimizing settings for some query that involves joining two relatively and equally large tables we have noticed very poor broadcast join performance. It was painfully slow (around 5 minutes) and most of the time was spent with little to now CPU usage on worker nodes. I have tracked down performance issue to construction of
InMemoryJoinHash
and I was able to speed the query up (down to around ~2m20s) by settingset session task_hash_build_concurrency=4;
. As expected still lots of time was spent duringInMemoryJoinHash
construction but now with 4 cores working out of 32 on the worker nodes. I was not able to increase concurrency further, because with higher value query was failing with out of memory message.I think we could and should optimize this step a little bit.
CC @losipiuk @kokosing