pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.53k stars 5.74k forks source link

ordering optimization: adopt functional dependency into the current optimizer #14580

Open zz-jason opened 4 years ago

zz-jason commented 4 years ago

Description

Is your feature request related to a problem? Please describe:

Let's say we have this table:

drop table if exists t;
create table t(a bigint, b bigint, index idx_a(a));

Suppose the best plan for the following query is to use index idx_a, which avoids a sorting:

TiDB(root@127.0.0.1:test) > desc select * from t use index(idx_a) where a = b order by a;
+------------------------+----------+-----------+--------------------------------------------------------------------+
| id                     | count    | task      | operator info                                                      |
+------------------------+----------+-----------+--------------------------------------------------------------------+
| Projection_17          | 8000.00  | root      | test.t.a, test.t.b                                                 |
| └─IndexLookUp_16       | 8000.00  | root      |                                                                    |
|   ├─IndexScan_13       | 10000.00 | cop[tikv] | table:t, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|   └─Selection_15       | 8000.00  | cop[tikv] | eq(test.t.a, test.t.b)                                             |
|     └─TableScan_14     | 10000.00 | cop[tikv] | table:t, keep order:false, stats:pseudo                            |
+------------------------+----------+-----------+--------------------------------------------------------------------+
5 rows in set (0.00 sec)

If we change the query's order by column from a to b, even if we force to use the index idx_a, the execution plan still needs a sorting:

TiDB(root@127.0.0.1:test) > desc select * from t use index(idx_a) where a = b order by b;
+-----------------------+----------+-----------+---------------------------------------------------------------------+
| id                    | count    | task      | operator info                                                       |
+-----------------------+----------+-----------+---------------------------------------------------------------------+
| Sort_5                | 8000.00  | root      | test.t.b:asc                                                        |
| └─IndexLookUp_11      | 8000.00  | root      |                                                                     |
|   ├─IndexScan_8       | 10000.00 | cop[tikv] | table:t, index:a, range:[NULL,+inf], keep order:false, stats:pseudo |
|   └─Selection_10      | 8000.00  | cop[tikv] | eq(test.t.a, test.t.b)                                              |
|     └─TableScan_9     | 10000.00 | cop[tikv] | table:t, keep order:false, stats:pseudo                             |
+-----------------------+----------+-----------+---------------------------------------------------------------------+
5 rows in set (0.00 sec)

This is because we can not recognize the functional dependency a -> b and b -> a, or the equivalence class {a, b} after the filter a = b.

ideally, the best execution plan for the two queries should be the same:

select * from t use index(idx_a) where a = b order by a;
select * from t use index(idx_a) where a = b order by b;

Describe the feature you'd like:

Document Collection

Talent Challenge Program information

Milestones and action items

Milestone 1: Clear the design proposal

Milestone 2: (Maybe) The graph structure prototype

Milestone 3: (Maybe) Integrate the graph structure to the planner.

Milestone 4: (Maybe) Show the improvements.

winoros commented 4 years ago

There's also another one describing a way to maintain the functional dependency. https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf