pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.86k stars 5.8k forks source link

Correlated subquery predicates can be pushed down #34988

Open King-Dylan opened 2 years ago

King-Dylan commented 2 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create table test1 (id varchar(40) not null primary key,name varchar(10),DATE datetime);
create index idx_name on test1(name);
insert into test1 values(1,'dylan1','2022-05-01');
insert into test1 values(2,'dylan2','2022-05-02');
insert into test1 values(3,'dylan3','2022-05-03');
insert into test1 values(4,'dylan4','2022-05-04');
insert into test1 values(5,'dylan5','2022-05-05');
insert into test1 values(6,'dylan6','2022-05-06');
insert into test1 values(7,'dylan7','2022-05-07');
insert into test1 values(8,'dylan8','2022-05-08');
create table test2 (id varchar(40) not null ,name varchar(10),DATE datetime);
create index idx_name on test2(name);
insert into test2 select * from test1;
analyze table test1,test2;

2. What did you expect to see? (Required)

mysql:[test] 20:27:23> explain select a.* from  test1 a where (select max(DATE) from test2 b where a.id=b.id)<=STR_TO_DATE('2022-05-07','%Y-%m-%d') and a.name ='dylan8';
+----------------------------------+---------+-----------+-------------------------------+------------------------------------------------------------------------------------------------------+
| id                               | estRows | task      | access object                 | operator info                                                                                        |
+----------------------------------+---------+-----------+-------------------------------+------------------------------------------------------------------------------------------------------+
| HashJoin_12                      | 256.00  | root      |                               | inner join, equal:[eq(test.test1.id, test.test2.id)]                                                 |
| ├─Selection_20(Build)            | 12.80   | root      |                               | le(Column#9, 2022-05-07)                                                                             |
| │ └─HashAgg_25                   | 16.00   | root      |                               | group by:test.test2.id, funcs:max(Column#11)->Column#9, funcs:firstrow(test.test2.id)->test.test2.id |
| │   └─TableReader_26             | 16.00   | root      |                               | data:HashAgg_21                                                                                      |
| │     └─HashAgg_21               | 16.00   | cop[tikv] |                               | group by:test.test2.id, funcs:max(test.test2.date)->Column#11                                        |
| │       └─TableFullScan_24       | 200.00  | cop[tikv] | table:b                       | keep order:false                                                                                     |
| └─IndexLookUp_19(Probe)          | 256.00  | root      |                               |                                                                                                      |
|   ├─IndexRangeScan_17(Build)     | 256.00  | cop[tikv] | table:a, index:idx_name(name) | range:["dylan8","dylan8"], keep order:false                                                          |
|   └─TableRowIDScan_18(Probe)     | 256.00  | cop[tikv] | table:a                       | keep order:false                                                                                     |
+----------------------------------+---------+-----------+-------------------------------+------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)

3. What did you see instead (Required)

mysql:[test] 20:37:55> explain select a.* from  test1 a where (select max(DATE) from test2 b where a.id=b.id)+1 > STR_TO_DATE('2022-05-07','%Y-%m-%d') and a.name ='dylan8';
+----------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------+
| id                               | estRows | task      | access object | operator info                                                                                               |
+----------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------+
| Projection_12                    | 6.40    | root      |               | test.test1.id, test.test1.name, test.test1.date                                                             |
| └─Selection_13                   | 6.40    | root      |               | eq(test.test1.name, "dylan8"), gt(plus(cast(Column#9, bigint(19) BINARY), 1), 20220507)                     |
|   └─HashJoin_15                  | 8.00    | root      |               | left outer join, equal:[eq(test.test1.id, test.test2.id)]                                                   |
|     ├─HashAgg_21(Build)          | 8.00    | root      |               | group by:test.test2.id, funcs:max(test.test2.date)->Column#10, funcs:firstrow(test.test2.id)->test.test2.id |
|     │ └─TableReader_26           | 8.00    | root      |               | data:TableFullScan_25                                                                                       |
|     │   └─TableFullScan_25       | 8.00    | cop[tikv] | table:b       | keep order:false                                                                                            |
|     └─TableReader_18(Probe)      | 8.00    | root      |               | data:TableFullScan_17                                                                                       |
|       └─TableFullScan_17         | 8.00    | cop[tikv] | table:a       | keep order:false                                                                                            |
+----------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)
mysql:[test] 20:46:37> explain select a.* from  test1 a  use index(idx_name)where (select max(DATE) from test2 b where a.id=b.id)+1 > STR_TO_DATE('2022-05-07','%Y-%m-%d') and a.name ='dylan8';
+----------------------------------------+---------+-----------+-------------------------------+-------------------------------------------------------------------------------------------------------+
| id                                     | estRows | task      | access object                 | operator info                                                                                         |
+----------------------------------------+---------+-----------+-------------------------------+-------------------------------------------------------------------------------------------------------+
| Projection_12                          | 6.40    | root      |                               | test.test1.id, test.test1.name, test.test1.date                                                       |
| └─Selection_13                         | 6.40    | root      |                               | eq(test.test1.name, "dylan8"), gt(plus(cast(Column#9, bigint(19) BINARY), 1), 20220507)               |
|   └─HashJoin_16                        | 8.00    | root      |                               | left outer join, equal:[eq(test.test1.id, test.test2.id)]                                             |
|     ├─IndexLookUp_19(Build)            | 8.00    | root      |                               |                                                                                                       |
|     │ ├─IndexFullScan_17(Build)        | 8.00    | cop[tikv] | table:a, index:idx_name(name) | keep order:false                                                                                      |
|     │ └─TableRowIDScan_18(Probe)       | 8.00    | cop[tikv] | table:a                       | keep order:false                                                                                      |
|     └─HashAgg_24(Probe)                | 12.00   | root      |                               | group by:test.test2.id, funcs:max(Column#12)->Column#10, funcs:firstrow(test.test2.id)->test.test2.id |
|       └─TableReader_25                 | 12.00   | root      |                               | data:HashAgg_20                                                                                       |
|         └─HashAgg_20                   | 12.00   | cop[tikv] |                               | group by:test.test2.id, funcs:max(test.test2.date)->Column#12                                         |
|           └─TableFullScan_23           | 24.00   | cop[tikv] | table:b                       | keep order:false                                                                                      |
+----------------------------------------+---------+-----------+-------------------------------+-------------------------------------------------------------------------------------------------------+

10 rows in set (0.00 sec)

4. What is your TiDB version? (Required)

v6.0.0

AilinKid commented 2 years ago

After discussion with @winoros We thought the main problem here is where to decorrelate an apply as join. By now TiDB always dives down to find the correlated inner operator, making the apply to be a join beneath it. which will cause the join to execute as early as possible, as close to the data source as possible. For this case, the correlated selection operator in the inner side can do the reverse way, pulling up as close to apply as possible.