pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

Support subquery unfolding #18452

Open ghost opened 4 years ago

ghost commented 4 years ago

Feature Request

Is your feature request related to a problem? Please describe:

This is a fork of https://github.com/pingcap/tidb/issues/5736 - with the original issue resolved, this is a feature request for an optimization for simple subqueries on the same table as the base table.

Describe the feature you'd like:

Consider the following two queries in the example. They could potentially use the same plan:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (pk varchar(32) NOT NULL PRIMARY KEY, ip varchar(32), INDEX (ip));

INSERT INTO t1 VALUES
('127.0.0.1', 'test1'),
('127.0.0.2', 'test2'),
('127.0.0.3', 'test3'),
('127.0.0.4', 'test4'),
('127.0.0.5', 'test5');

SELECT SLEEP(1);
ANALYZE TABLE t1;

EXPLAIN SELECT * FROM t1 WHERE pk IN (SELECT pk FROM t1 WHERE ip IN ('127.0.0.1'));
EXPLAIN SELECT * FROM t1 WHERE ip IN ('127.0.0.1');

..

mysql> EXPLAIN SELECT * FROM t1 WHERE pk IN (SELECT pk FROM t1 WHERE ip IN ('127.0.0.1'));
+------------------------------------+---------+-----------+-----------------------------+------------------------------------------------------------------------------+
| id                                 | estRows | task      | access object               | operator info                                                                |
+------------------------------------+---------+-----------+-----------------------------+------------------------------------------------------------------------------+
| IndexJoin_16                       | 0.00    | root      |                             | inner join, inner:IndexLookUp_15, outer key:test.t1.pk, inner key:test.t1.pk |
| ├─IndexLookUp_55(Build)            | 0.00    | root      |                             |                                                                              |
| │ ├─IndexRangeScan_53(Build)       | 0.00    | cop[tikv] | table:t1, index:ip(ip)      | range:["127.0.0.1","127.0.0.1"], keep order:false                            |
| │ └─TableRowIDScan_54(Probe)       | 0.00    | cop[tikv] | table:t1                    | keep order:false                                                             |
| └─IndexLookUp_15(Probe)            | 1.00    | root      |                             |                                                                              |
|   ├─IndexRangeScan_13(Build)       | 1.00    | cop[tikv] | table:t1, index:PRIMARY(pk) | range: decided by [eq(test.t1.pk, test.t1.pk)], keep order:false             |
|   └─TableRowIDScan_14(Probe)       | 1.00    | cop[tikv] | table:t1                    | keep order:false                                                             |
+------------------------------------+---------+-----------+-----------------------------+------------------------------------------------------------------------------+
7 rows in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 WHERE ip IN ('127.0.0.1');
+-------------------------------+---------+-----------+------------------------+---------------------------------------------------+
| id                            | estRows | task      | access object          | operator info                                     |
+-------------------------------+---------+-----------+------------------------+---------------------------------------------------+
| IndexLookUp_10                | 0.00    | root      |                        |                                                   |
| ├─IndexRangeScan_8(Build)     | 0.00    | cop[tikv] | table:t1, index:ip(ip) | range:["127.0.0.1","127.0.0.1"], keep order:false |
| └─TableRowIDScan_9(Probe)     | 0.00    | cop[tikv] | table:t1               | keep order:false                                  |
+-------------------------------+---------+-----------+------------------------+---------------------------------------------------+
3 rows in set (0.00 sec)

For context, this is not currently an optimization supported by MySQL:

mysql [localhost:8020] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE pk IN (SELECT pk FROM t1 WHERE ip IN ('127.0.0.1'));
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref        | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ref    | PRIMARY,ip    | ip      | 131     | const      |    1 |   100.00 | Using index |
|  1 | SIMPLE      | t1    | NULL       | eq_ref | PRIMARY       | PRIMARY | 130     | test.t1.pk |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

mysql [localhost:8020] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE ip IN ('127.0.0.1');
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ref  | ip            | ip   | 131     | const |    1 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

Describe alternatives you've considered:

N/A

Teachability, Documentation, Adoption, Migration Strategy:

Should be a transparent optimization.

zz-jason commented 4 years ago

The default execution plan I think is the same with unfolding the subquery and then execute the whole query in this case because of the IndexJoin:

mysql> EXPLAIN SELECT * FROM t1 WHERE pk IN (SELECT pk FROM t1 WHERE ip IN ('127.0.0.1'));
+------------------------------------+---------+-----------+-----------------------------+------------------------------------------------------------------------------+
| id                                 | estRows | task      | access object               | operator info                                                                |
+------------------------------------+---------+-----------+-----------------------------+------------------------------------------------------------------------------+
| IndexJoin_16                       | 0.00    | root      |                             | inner join, inner:IndexLookUp_15, outer key:test.t1.pk, inner key:test.t1.pk |
| ├─IndexLookUp_55(Build)            | 0.00    | root      |                             |                                                                              |
| │ ├─IndexRangeScan_53(Build)       | 0.00    | cop[tikv] | table:t1, index:ip(ip)      | range:["127.0.0.1","127.0.0.1"], keep order:false                            |
| │ └─TableRowIDScan_54(Probe)       | 0.00    | cop[tikv] | table:t1                    | keep order:false                                                             |
| └─IndexLookUp_15(Probe)            | 1.00    | root      |                             |                                                                              |
|   ├─IndexRangeScan_13(Build)       | 1.00    | cop[tikv] | table:t1, index:PRIMARY(pk) | range: decided by [eq(test.t1.pk, test.t1.pk)], keep order:false             |
|   └─TableRowIDScan_14(Probe)       | 1.00    | cop[tikv] | table:t1                    | keep order:false                                                             |
+------------------------------------+---------+-----------+-----------------------------+------------------------------------------------------------------------------+
7 rows in set (0.00 sec)

In the above execution plan, the build side of IndexJoin is an IndexRangeScan, the probe side of the IndexJoin is another IndexRangeScan, the range is calculated based on the result of the build side IndexRangeScan.

zz-jason commented 4 years ago

There might be a change that the root operator is not IndexJoin_16. For example, it might be a Hash Join depends on the selectivity of both join sides and the selectivity of join operator.

zz-jason commented 4 years ago

@winoros @eurekaka PTAL

morgo commented 3 years ago

This is similar to https://github.com/pingcap/tidb/issues/25540