pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.06k stars 5.83k forks source link

join reorder using greedy algorithm use wrong rule for the cost of limit operator #54125

Open tiancaiamao opened 3 months ago

tiancaiamao commented 3 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

explain SELECT  rel.date_updated, rel.sign_no, rel.cust_no, info.medium_name     AS agree_medium_name, info.medium_class    AS agree_medium_class, rel.medium_biz_no    AS biz_no, ''                   AS agree_biz_sub_no, '2'                  AS meta_type, rel.`status`, rel.sign_date, rel.term_date FROM carc_cust_medium_rel rel   INNER JOIN (SELECT sign_no FROM carc_cust_medium_rel LIMIT 20) rep  ON rep.sign_no = rel.sign_no LEFT JOIN carc_medium_info info  ON info.`id` = rel.medium_id;

mysql> explain SELECT  rel.date_updated, rel.sign_no, rel.cust_no, info.medium_name     AS agree_medium_name, info.medium_class    AS agree_medium_class, rel.medium_biz_no    AS biz_no, ''                   AS agree_biz_sub_no, '2'                  AS meta_type, rel.`status`, rel.sign_date, rel.term_date FROM carc_cust_medium_rel rel   INNER JOIN (SELECT sign_no FROM carc_cust_medium_rel LIMIT 20) rep  ON rep.sign_no = rel.sign_no LEFT JOIN carc_medium_info info  ON info.`id` = rel.medium_id;
+----------------------------------------+---------+-----------+----------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                                     | estRows | task      | access object                                      | operator info                                                                                                                                                                                                                                                                                                                                                       |
+----------------------------------------+---------+-----------+----------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_13                          | 21.15   | root      |                                                    | carc.carc_cust_medium_rel.date_updated, carc.carc_cust_medium_rel.sign_no, carc.carc_cust_medium_rel.cust_no, carc.carc_medium_info.medium_name, carc.carc_medium_info.medium_class, carc.carc_cust_medium_rel.medium_biz_no, ->Column#64, 2->Column#65, carc.carc_cust_medium_rel.status, carc.carc_cust_medium_rel.sign_date, carc.carc_cust_medium_rel.term_date |
| └─IndexJoin_17                         | 21.15   | root      |                                                    | left outer join, inner:IndexLookUp_16, outer key:carc.carc_cust_medium_rel.medium_id, inner key:carc.carc_medium_info.id, equal cond:eq(carc.carc_cust_medium_rel.medium_id, carc.carc_medium_info.id)                                                                                                                                                              |
|   ├─IndexHashJoin_33(Build)            | 21.14   | root      |                                                    | inner join, inner:IndexLookUp_30, outer key:carc.carc_cust_medium_rel.sign_no, inner key:carc.carc_cust_medium_rel.sign_no, equal cond:eq(carc.carc_cust_medium_rel.sign_no, carc.carc_cust_medium_rel.sign_no)                                                                                                                                                     |
|   │ ├─Limit_42(Build)                  | 20.00   | root      |                                                    | offset:0, count:20                                                                                                                                                                                                                                                                                                                                                  |
|   │ │ └─IndexReader_48                 | 20.00   | root      |                                                    | index:Limit_47                                                                                                                                                                                                                                                                                                                                                      |
|   │ │   └─Limit_47                     | 20.00   | cop[tikv] |                                                    | offset:0, count:20                                                                                                                                                                                                                                                                                                                                                  |
|   │ │     └─IndexFullScan_45           | 20.00   | cop[tikv] | table:carc_cust_medium_rel, index:PRIMARY(sign_no) | keep order:false                                                                                                                                                                                                                                                                                                                                                    |
|   │ └─IndexLookUp_30(Probe)            | 20.00   | root      |                                                    |                                                                                                                                                                                                                                                                                                                                                                     |
|   │   ├─IndexRangeScan_28(Build)       | 20.00   | cop[tikv] | table:rel, index:PRIMARY(sign_no)                  | range: decided by [eq(carc.carc_cust_medium_rel.sign_no, carc.carc_cust_medium_rel.sign_no)], keep order:false                                                                                                                                                                                                                                                      |
|   │   └─TableRowIDScan_29(Probe)       | 20.00   | cop[tikv] | table:rel                                          | keep order:false                                                                                                                                                                                                                                                                                                                                                    |
|   └─IndexLookUp_16(Probe)              | 21.14   | root      |                                                    |                                                                                                                                                                                                                                                                                                                                                                     |
|     ├─IndexRangeScan_14(Build)         | 21.14   | cop[tikv] | table:info, index:PRIMARY(id)                      | range: decided by [eq(carc.carc_medium_info.id, carc.carc_cust_medium_rel.medium_id)], keep order:false                                                                                                                                                                                                                                                             |
|     └─TableRowIDScan_15(Probe)         | 21.14   | cop[tikv] | table:info                                         | keep order:false                                                                                                                                                                                                                                                                                                                                                    |
+----------------------------------------+---------+-----------+----------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)

3. What did you see instead (Required)

mysql> explain SELECT  rel.date_updated, rel.sign_no, rel.cust_no, info.medium_name     AS agree_medium_name, info.medium_class    AS agree_medium_class, rel.medium_biz_no    AS biz_no, ''                   AS agree_biz_sub_no, '2'                  AS meta_type, rel.`status`, rel.sign_date, rel.term_date FROM carc_cust_medium_rel rel   INNER JOIN (SELECT sign_no FROM carc_cust_medium_rel LIMIT 20) rep  ON rep.sign_no = rel.sign_no LEFT JOIN carc_medium_info info  ON info.`id` = rel.medium_id;
+----------------------------------+-------------+-----------+----------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                               | estRows     | task      | access object                                      | operator info                                                                                                                                                                                                                                                                                                                                                       |
+----------------------------------+-------------+-----------+----------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_15                    | 21.15       | root      |                                                    | carc.carc_cust_medium_rel.date_updated, carc.carc_cust_medium_rel.sign_no, carc.carc_cust_medium_rel.cust_no, carc.carc_medium_info.medium_name, carc.carc_medium_info.medium_class, carc.carc_cust_medium_rel.medium_biz_no, ->Column#64, 2->Column#65, carc.carc_cust_medium_rel.status, carc.carc_cust_medium_rel.sign_date, carc.carc_cust_medium_rel.term_date |
| └─HashJoin_17                    | 21.15       | root      |                                                    | inner join, equal:[eq(carc.carc_cust_medium_rel.sign_no, carc.carc_cust_medium_rel.sign_no)]                                                                                                                                                                                                                                                                        |
|   ├─Limit_37(Build)              | 20.00       | root      |                                                    | offset:0, count:20                                                                                                                                                                                                                                                                                                                                                  |
|   │ └─IndexReader_43             | 20.00       | root      |                                                    | index:Limit_42                                                                                                                                                                                                                                                                                                                                                      |
|   │   └─Limit_42                 | 20.00       | cop[tikv] |                                                    | offset:0, count:20                                                                                                                                                                                                                                                                                                                                                  |
|   │     └─IndexFullScan_40       | 20.00       | cop[tikv] | table:carc_cust_medium_rel, index:PRIMARY(sign_no) | keep order:false                                                                                                                                                                                                                                                                                                                                                    |
|   └─HashJoin_31(Probe)           | 12951356.78 | root      |                                                    | left outer join, equal:[eq(carc.carc_cust_medium_rel.medium_id, carc.carc_medium_info.id)]                                                                                                                                                                                                                                                                          |
|     ├─TableReader_36(Build)      | 10125212.00 | root      |                                                    | data:TableFullScan_35                                                                                                                                                                                                                                                                                                                                               |
|     │ └─TableFullScan_35         | 10125212.00 | cop[tikv] | table:info                                         | keep order:false                                                                                                                                                                                                                                                                                                                                                    |
|     └─TableReader_34(Probe)      | 12951092.00 | root      |                                                    | data:TableFullScan_33                                                                                                                                                                                                                                                                                                                                               |
|       └─TableFullScan_33         | 12951092.00 | cop[tikv] | table:rel                                          | keep order:false                                                                                                                                                                                                                                                                                                                                                    |
+----------------------------------+-------------+-----------+----------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows in set (0.00 sec)

I expect the join order for that query is: rel (limit20) INNER JOIN rel LEFT JOIN info But the actual order is rel LEFT JOIN info INNER JOIN rel (limit 20)

4. What is your TiDB version? (Required)

master

mysql> select tidb_version();
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v8.2.0-alpha-382-gd1f2671557-dirty
Edition: Community
Git Commit Hash: d1f2671557726507cc2ded1e4cc989a9ac55f8bb
Git Branch: master
UTC Build Time: 2024-06-19 14:59:48
GoVersion: go1.21.0
Race Enabled: false
Check Table Before Drop: false
Store: unistore |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
tiancaiamao commented 3 months ago

When we estimate the cost of a join, it's cost of A + cost of B + estimated rows of the join result X This looks reasonable.

             X
       /            \
    A              B

How to estimate the cost of A? especially when A is Limit + TableScan We use the estimated rows + cost of its children, this is unreasonable considering the limit case.

select * from rel   ;; rel has 1e7 rows
select * from info  ;; info has 1.2e7 rows
select * from rel limit 20    ;; the result set has just 20 rows
(select * from rel limit 20) JOIN rel JOIN info

For join reorder using greedy algorithm, apparently we should choose (select * from rel limit 20) as the start point.

But due to the wrong cost calculation of (select * from rel limit 20), info is choosed!

winoros commented 3 months ago

It's a known limitation of current logical cost.

tiancaiamao commented 3 months ago

https://github.com/pingcap/tidb/blob/1a24c032126dce4a79ea14b51108dbf4caf02a03/pkg/planner/core/rule_join_reorder.go#L491-L498