pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.01k stars 5.82k forks source link

transform semi join to inner join #7205

Closed zz-jason closed 5 years ago

zz-jason commented 6 years ago

Take this query as an example:

select * from t1 where t1.a in (select a from t2);

The plan for this query can be:

2018-07-31 16 36 18

Here the above "semi join" can be transformed to an "inner join":

2018-07-31 16 42 43

This transform is logically equivalent, they can result different physical plans.

In some scenarios, table "t1" has an index on column "t1.a", the transformed "InnerJoin" can be implemented by a "IndexJoin", which uses the result of the "Aggregate" as the outer rows to build the index ranges on "t1.a", directly extract the matched rows on table "t1". That is to say: the physical plan of the later one can be more efficient than the former one.

winoros commented 6 years ago

It depends on stats so i think we'd better not add it in the current optimizer.

zz-jason commented 6 years ago

@winoros Yes, it's better after we have supported the volcano or cascades planner model.

winoros commented 5 years ago

Done by #7531. It will be controlled by cost in the cascades-like planner.


Why we do this in the current planner? It's shown that this way can beat unfold in-subquery in most cases.