pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.21k stars 5.84k forks source link

correlated subqueries with Limit cannot be decorrelated #29937

Open qw4990 opened 2 years ago

qw4990 commented 2 years ago

Enhancement

mysql> create table t1 (a int, key(a));
Query OK, 0 rows affected (0.00 sec)

mysql> create table t2 (b int, key(b));
Query OK, 0 rows affected (0.00 sec)

mysql> explain select t1.a, (select t2.b from t2 where t2.b=t1.a limit 1) from t1;
+-------------------------------+----------+-----------+----------------------+------------------------------------------------------------------------------------+
| id                            | estRows  | task      | access object        | operator info                                                                      |
+-------------------------------+----------+-----------+----------------------+------------------------------------------------------------------------------------+
| Apply_14                      | 10000.00 | root      |                      | CARTESIAN left outer join                                                          |
| ├─TableReader_16(Build)       | 10000.00 | root      |                      | data:TableFullScan_15                                                              |
| │ └─TableFullScan_15          | 10000.00 | cop[tikv] | table:t1             | keep order:false, stats:pseudo                                                     |
| └─Limit_19(Probe)             | 1.00     | root      |                      | offset:0, count:1                                                                  |
|   └─IndexReader_23            | 1.00     | root      |                      | index:Limit_22                                                                     |
|     └─Limit_22                | 1.00     | cop[tikv] |                      | offset:0, count:1                                                                  |
|       └─IndexRangeScan_21     | 1.00     | cop[tikv] | table:t2, index:b(b) | range: decided by [eq(papdata.t2.b, papdata.t1.a)], keep order:false, stats:pseudo |
+-------------------------------+----------+-----------+----------------------+------------------------------------------------------------------------------------+
7 rows in set (0.00 sec)

As shown above, the query is not be decorrelated and the Apply is used, which is slow.

A better plan should be HashJoin or IndexJoin like:

mysql> explain select t1.a, (select min(t2.b) from t2 where t2.b=t1.a) from t1;
+------------------------------+----------+-----------+----------------------+------------------------------------------------------------------------------------------------------+
| id                           | estRows  | task      | access object        | operator info                                                                                        |
+------------------------------+----------+-----------+----------------------+------------------------------------------------------------------------------------------------------+
| HashJoin_12                  | 10000.00 | root      |                      | left outer join, equal:[eq(papdata.t1.a, papdata.t2.b)]                                              |
| ├─HashAgg_29(Build)          | 7992.00  | root      |                      | group by:papdata.t2.b, funcs:min(papdata.t2.b)->Column#7, funcs:firstrow(papdata.t2.b)->papdata.t2.b |
| │ └─IndexReader_36           | 9990.00  | root      |                      | index:IndexFullScan_35                                                                               |
| │   └─IndexFullScan_35       | 9990.00  | cop[tikv] | table:t2, index:b(b) | keep order:false, stats:pseudo                                                                       |
| └─TableReader_24(Probe)      | 10000.00 | root      |                      | data:TableFullScan_23                                                                                |
|   └─TableFullScan_23         | 10000.00 | cop[tikv] | table:t1             | keep order:false, stats:pseudo                                                                       |
+------------------------------+----------+-----------+----------------------+------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
winoros commented 2 years ago

This is not equivalence transformation. Or we limit its meaning?