prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.75k stars 5.28k forks source link

Prioritize high confidence stats during broadcast joins #23016

Closed abhinavmuk04 closed 1 day ago

abhinavmuk04 commented 2 weeks ago

Description

Prioritize high confidence stats during broadcast joins if enabled

Motivation and Context

When there are two PlanNodes in which they are both small enough for broadcast join we will prioritize the side which has higher confidence stats. If they both have high confidence stats then we keep the original behavior. The user has the ability to turn this on and off.

Impact

This change will create a feature which the user can utilize to improve optimization and help improve the execution time of broadcast join queries

Test Plan

Implemented various tests in both DetermineJoinDistributionType and ReorderJoinsType, which will check if, with the session property enable, nodes with the higher confidence stats will be broadcasted

Contributor checklist

Release Notes

General Changes
* Add confidence based broadcasting, side of join with highest confidence will be on build side. 
  This can be enabled with the ``confidence_based_broadcast`` session property :pr:`23016`
feilong-liu commented 1 day ago

Code lgtm. However, as a code owner, I do not have ownership for the SystemSessionProperty file, will need an committer approval for help to merge the change here.