pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

Correlation column has function processing and supports index join #42758

Open together-wang opened 1 year ago

together-wang commented 1 year ago

Feature Request

Is your feature request related to a problem? Please describe:

Indexed column cannot use Index Join when it undergoes function processing.

mysql> show create table test;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                          |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test  | CREATE TABLE `test` (
  `id` bigint(20) DEFAULT NULL,
  `dt` datetime DEFAULT NULL,
  KEY `idx_dt` (`dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v6.5.0 |
+--------------------+
1 row in set (0.00 sec)

mysql> explain SELECT /*+ INL_JOIN(o,t) */ count(1) AS a FROM test o JOIN ( SELECT SUBSTRING_INDEX(GROUP_CONCAT(id ORDER BY dt DESC), ',', 1) AS id FROM test) t ON t.id = o.id;
+------------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------+
| id                                 | estRows | task      | access object | operator info                                                                 |
+------------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------+
| StreamAgg_12                       | 1.00    | root      |               | funcs:count(1)->Column#9                                                      |
| └─HashJoin_32                      | 1.25    | root      |               | inner join, equal:[eq(Column#11, Column#12)]                                  |
|   ├─Projection_18(Build)           | 1.00    | root      |               | cast(substring_index(Column#7, ,, 1), double BINARY)->Column#12               |
|   │ └─StreamAgg_22                 | 1.00    | root      |               | funcs:group_concat(Column#16 order by Column#17 desc separator ",")->Column#7 |
|   │   └─Projection_34              | 5.00    | root      |               | cast(test.test.id, var_string(20))->Column#16, test.test.dt                   |
|   │     └─TableReader_29           | 5.00    | root      |               | data:TableFullScan_28                                                         |
|   │       └─TableFullScan_28       | 5.00    | cop[tikv] | table:test    | keep order:false, stats:pseudo                                                |
|   └─Projection_15(Probe)           | 5.00    | root      |               | cast(test.test.id, double BINARY)->Column#11                                  |
|     └─TableReader_17               | 5.00    | root      |               | data:TableFullScan_16                                                         |
|       └─TableFullScan_16           | 5.00    | cop[tikv] | table:o       | keep order:false, stats:pseudo                                                |
+------------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------+
10 rows in set, 4 warnings (0.01 sec)

mysql> show warnings;
+---------+------+-------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                   |
+---------+------+-------------------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(o, t) */ or /*+ TIDB_INLJ(o, t) */ is inapplicable            |
| Warning | 1105 | Aggregation can not be pushed to tikv because AggFunc `group_concat` is not supported now |
| Warning | 1105 | Aggregation can not be pushed to tikv because AggFunc `group_concat` is not supported now |
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(o, t) */ or /*+ TIDB_INLJ(o, t) */ is inapplicable            |
+---------+------+-------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

mysql>

Describe the feature you'd like:

Correlation column has function processing and supports index join.

Describe alternatives you've considered:

N/A

Teachability, Documentation, Adoption, Migration Strategy:

qw4990 commented 1 year ago

The root cause is that after applying the substring_index the 2 join key columns then have different types.

image

So to let it can use IndexJoin again, we can add an explicit cast here: d90133ce-5874-42f1-ad02-97dfd9a7a03e b7mTuhdwhc

qw4990 commented 1 year ago

The optimizer should be better to notify or warn why it cannot use IndexJoin here. Join key types are mismatched in this case.