Open SilenceJD opened 3 years ago
This is interesting. @SilenceJD, could you help to dump the design doc into a google doc? The link above seems to require registration.
This is interesting. @SilenceJD, could you help to dump the design doc into a google doc? The link above seems to require registration.
Sorry, I've updated the link to Google Doc now.
is this similar to https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html ?
Right, this is the same flow in https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html. However, they do have bloom filter to make it more efficient. Partition is a too loose granularity. We actually tried the partition pruning and it was working pretty bad. The queries we have a super fast with seconds or sub seconds latency. The roundtrip can kill the performance. @SilenceJD, wonder in your use cases, what the the latency ranges?
Right, this is the same flow in https://trino.io/blog/2020/06/14/dynamic-partition-pruning.html. However, they do have bloom filter to make it more efficient. Partition is a too loose granularity. We actually tried the partition pruning and it was working pretty bad. The queries we have a super fast with seconds or sub seconds latency. The roundtrip can kill the performance. @SilenceJD, wonder in your use cases, what the the latency ranges?
In our scenarios, some big queries with elapsed time of 60+ seconds. I think this is benefit for big inner join queries.
@SilenceJD, could you send out a PR for this? I don't see any other approach that can gather build side digest effectively.
@SilenceJD, could you send out a PR for this? I don't see any other approach that can gather build side digest effectively.
Sure, I will migrate the code to the latest version and send out.
@SilenceJD, could you send out a PR for this? I don't see any other approach that can gather build side digest effectively
https://github.com/prestodb/presto/pull/15763 This is the PR of dynamic filter for hash join.
@highker hi, I‘ve fixed the coding style already, would you mind give more suggestions for the design?
Is this feature accepted and merged to master ?
Join is a very time-consuming operation, especially when there is a large amount of data. Currently, Presto 0.241+ supports local dynamic filtering for broadcast inner-joins, but does not have a dynamic filtering for hash inner-joins. In order to fill this gap, we propose dynamic filter for hash inner-joins.
Compared with Presto dynamic filter and dynamic filter for hash join we proposed, the main differences are as follows.
We have integrated tests for our DF and have a detailed test report. The test results show that we covered 2.30% of SQLs in Bytedance, and 54.62% of which achieved an average of 22.81% time improvement.
Our Design Doc:Presto Dynamic Filter For Hash Join Design Our Test Report:Test Report of Presto Dynamic Filter For Hash Join