Closed Guangggggg closed 1 year ago
[REVIEW NOTIFICATION]
This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer
in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer
in the comment to help you merge this pull request.
The full list of commands accepted by this bot can be found here.
thank you for your contribute Please
mvn mvn-scalafmt_2.12:format -Dscalafmt.skip=false
LGTM~
/run-all-tests
LGTM~
There is one pending checks. what do i need to do to merge?
@zhangyangyu @xuanyu66 PTAL
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: shiyuhang0, xuanyu66
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Timeline:
2023-06-16 01:32:53.10259407 +0000 UTC m=+308569.517198147
: :ballot_box_with_check: agreed by shiyuhang0.2023-06-16 06:03:34.614651565 +0000 UTC m=+324811.029255636
: :ballot_box_with_check: agreed by xuanyu66.
What problem does this PR solve?
A small table can not be generated BroadcastJoin Physical Plan.
The table size is less than default value of 'spark.sql.autoBroadcastJoinThreshold' .
And although TableSizeEstimator.estimatedTableSize < 'spark.sql.autoBroadcastJoinThreshold' , the physical plan is not still BroadcastJoin.
In some cases, this will reduce the execution efficiency of tispark, for example, when the user sets 'spark.sql.adaptive.enabled=false', the table will never be brocast
What is changed and how it works?
This PR create the 'TiDBTableScan' class to provide 'sizeInBytes' of table when generate physical plan, which can be used to determined whether to broadcast a table. And Scan class is common in other file formats such as parquet,orc and so on.
Check List
Tests
Code changes
Side effects
Related changes
tidb-ansible
repository