Open mzhang77 opened 1 week ago
I'm able to reproduce the problem using this script to generate test data: https://gist.github.com/mzhang77/a7cb08949d8473edbde4a27bde0d3530
Cost for default choice MergeJoin
mysql> explain format='cost_trace' SELECT `d`.* FROM `i` LEFT JOIN `d` ON `i`.`object_id` = `d`.`object_id` WHERE `i`.`cid` = 249 ORDER BY `i`.`object_id` LIMIT 1000 OFFSET 18000;
+-----------------------------------+----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+------------------------------------------------------------------------+
| id | estRows | estCost | costFormula | task | access object | operator info |
+-----------------------------------+----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+------------------------------------------------------------------------+
| Projection_12 | 1000.00 | 1417546.02 | (((((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (((scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7))) + (net(30000*rowsize(67.00999999999999)*tidb_kv_net_factor(3.96))))/15.00) + ((cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (cpu(30000*filters(0)*tidb_cpu_factor(49.9)))) + ((group(19000*cols(0.01)*tidb_cpu_factor(49.9))) + (group(30000*cols(0.01)*tidb_cpu_factor(49.9)))))) + ((cpu(1000*filters(0.04)*tidb_cpu_factor(49.9)))/5.00) | root | | test.d.mid, test.d.object_id, test.d.ov, test.d.version |
| └─Limit_19 | 1000.00 | 1417146.82 | ((((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (((scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7))) + (net(30000*rowsize(67.00999999999999)*tidb_kv_net_factor(3.96))))/15.00) + ((cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (cpu(30000*filters(0)*tidb_cpu_factor(49.9)))) + ((group(19000*cols(0.01)*tidb_cpu_factor(49.9))) + (group(30000*cols(0.01)*tidb_cpu_factor(49.9))))) | root | | offset:18000, count:1000 |
| └─MergeJoin_67 | 19000.00 | 1417146.82 | (((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (((scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7))) + (net(30000*rowsize(67.00999999999999)*tidb_kv_net_factor(3.96))))/15.00) + ((cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (cpu(30000*filters(0)*tidb_cpu_factor(49.9)))) + ((group(19000*cols(0.01)*tidb_cpu_factor(49.9))) + (group(30000*cols(0.01)*tidb_cpu_factor(49.9)))) | root | | left outer join, left key:test.i.object_id, right key:test.d.object_id |
| ├─TableReader_53(Build) | 30000.00 | 1024516.39 | ((scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7))) + (net(30000*rowsize(67.00999999999999)*tidb_kv_net_factor(3.96))))/15.00 | root | | data:TableFullScan_52 |
| │ └─TableFullScan_52 | 30000.00 | 7406957.80 | scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7)) | cop[tikv] | table:d | keep order:true |
| └─Limit_36(Probe) | 19000.00 | 368179.43 | ((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00) | root | | offset:0, count:19000 |
| └─IndexReader_49 | 19000.00 | 368179.43 | (((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00 | root | | index:Limit_48 |
| └─Limit_48 | 19000.00 | 4318851.50 | (scan(19000*logrowsize(48)*tikv_scan_factor(40.7))) | cop[tikv] | | offset:0, count:19000 |
| └─IndexRangeScan_47 | 19000.00 | 4318851.50 | scan(19000*logrowsize(48)*tikv_scan_factor(40.7)) | cop[tikv] | table:i, index:i_3(cid, object_id) | range:[249,249], keep order:true |
+-----------------------------------+----------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+------------------------------------------------------------------------+
Cost for index join choice
mysql> explain format='cost_trace' SELECT /*+ INL_JOIN(d) */ `d`.* FROM `i` LEFT JOIN `d` ON `i`.`object_id` = `d`.`object_id` WHERE `i`.`cid` = 249 ORDER BY `i`.`object_id` LIMIT 1
000
+-------------------------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | estCost | costFormula | task | access object | operator info |
+-------------------------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_11 | 1000.00 | 10556981.64 | (((cpu(10*3*tidb_cpu_factor(49.9))) + (((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (cpu(19000*10*tidb_cpu_factor(49.9))) + ((() + (((((scan(1*logrowsize(540)*tikv_scan_factor(40.7))) + (net(1*rowsize(540)*tidb_kv_net_factor(3.96))))/15.00)*19000.00)/6.00) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + ((hashkey(19000*0*tidb_cpu_factor(49.9))) + (hashmem(19000*540*tidb_mem_factor(0.2))) + (hashbuild(19000*tidb_cpu_factor(49.9)))))/5.00))) + ((cpu(1000*filters(0.04)*tidb_cpu_factor(49.9)))/5.00) | root | | test.d.mid, test.d.object_id, test.d.ov, test.d.version |
| └─Limit_18 | 1000.00 | 10556582.44 | ((cpu(10*3*tidb_cpu_factor(49.9))) + (((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (cpu(19000*10*tidb_cpu_factor(49.9))) + ((() + (((((scan(1*logrowsize(540)*tikv_scan_factor(40.7))) + (net(1*rowsize(540)*tidb_kv_net_factor(3.96))))/15.00)*19000.00)/6.00) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + ((hashkey(19000*0*tidb_cpu_factor(49.9))) + (hashmem(19000*540*tidb_mem_factor(0.2))) + (hashbuild(19000*tidb_cpu_factor(49.9)))))/5.00)) | root | | offset:18000, count:1000 |
| └─IndexJoin_54 | 19000.00 | 10556582.44 | (cpu(10*3*tidb_cpu_factor(49.9))) + (((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (cpu(19000*10*tidb_cpu_factor(49.9))) + ((() + (((((scan(1*logrowsize(540)*tikv_scan_factor(40.7))) + (net(1*rowsize(540)*tidb_kv_net_factor(3.96))))/15.00)*19000.00)/6.00) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + ((hashkey(19000*0*tidb_cpu_factor(49.9))) + (hashmem(19000*540*tidb_mem_factor(0.2))) + (hashbuild(19000*tidb_cpu_factor(49.9)))))/5.00) | root | | left outer join, inner:TableReader_51, outer key:test.i.object_id, inner key:test.d.object_id, equal cond:eq(test.i.object_id, test.d.object_id) |
| ├─Limit_63(Build) | 19000.00 | 368179.43 | ((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00) | root | | offset:0, count:19000 |
| │ └─IndexReader_69 | 19000.00 | 368179.43 | (((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00 | root | | index:Limit_68 |
| │ └─Limit_68 | 19000.00 | 4318851.50 | (scan(19000*logrowsize(48)*tikv_scan_factor(40.7))) | cop[tikv] | | offset:0, count:19000 |
| │ └─IndexRangeScan_44 | 19000.00 | 4318851.50 | scan(19000*logrowsize(48)*tikv_scan_factor(40.7)) | cop[tikv] | table:i, index:i_3(cid, object_id) | range:[249,249], keep order:true |
| └─TableReader_51(Probe) | 19000.00 | 167.19 | ((scan(1*logrowsize(540)*tikv_scan_factor(40.7))) + (net(1*rowsize(540)*tidb_kv_net_factor(3.96))))/15.00 | root | | data:TableRangeScan_50 |
| └─TableRangeScan_50 | 19000.00 | 369.43 | scan(1*logrowsize(540)*tikv_scan_factor(40.7)) | cop[tikv] | table:d | range: decided by [test.i.object_id], keep order:false |
+-------------------------------------+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)
Cost for hash join choice
mysql> explain format='cost_trace' SELECT /*+ HASH_JOIN(d) */ `d`.* FROM `i` LEFT JOIN `d` ON `i`.`object_id` = `d`.`object_id` WHERE `i`.`cid` = 249 ORDER BY `i`.`object_id` LIMIT 1000 OFFSET 18000;
+-------------------------------------+----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+-----------------------------------------------------------------+
| id | estRows | estCost | costFormula | task | access object | operator info |
+-------------------------------------+----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+-----------------------------------------------------------------+
| Projection_11 | 1000.00 | 19508812.18 | (((cpu(10*3*tidb_cpu_factor(49.9))) + (((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (((scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7))) + (net(30000*rowsize(67.00999999999999)*tidb_kv_net_factor(3.96))))/15.00) + ((hashkey(19000*1*tidb_cpu_factor(49.9))) + (hashmem(19000*16*tidb_mem_factor(0.2))) + (hashbuild(19000*tidb_cpu_factor(49.9)))) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (((cpu(30000*filters(0)*tidb_cpu_factor(49.9))) + ((hashkey(30000*1*tidb_cpu_factor(49.9))) + (hashprobe(30000*tidb_cpu_factor(49.9)))))/5.00)) + ((exprCPU(19000*0*tidb_cpu_factor(49.9))) + (orderCPU(19000*log(19000)*tidb_cpu_factor(49.9)))) + (topMem(19000*548*tidb_mem_factor(0.2)))) + ((cpu(1000*filters(0.04)*tidb_cpu_factor(49.9)))/5.00) | root | | test.d.mid, test.d.object_id, test.d.ov, test.d.version |
| └─TopN_14 | 1000.00 | 19508412.98 | ((cpu(10*3*tidb_cpu_factor(49.9))) + (((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (((scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7))) + (net(30000*rowsize(67.00999999999999)*tidb_kv_net_factor(3.96))))/15.00) + ((hashkey(19000*1*tidb_cpu_factor(49.9))) + (hashmem(19000*16*tidb_mem_factor(0.2))) + (hashbuild(19000*tidb_cpu_factor(49.9)))) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (((cpu(30000*filters(0)*tidb_cpu_factor(49.9))) + ((hashkey(30000*1*tidb_cpu_factor(49.9))) + (hashprobe(30000*tidb_cpu_factor(49.9)))))/5.00)) + ((exprCPU(19000*0*tidb_cpu_factor(49.9))) + (orderCPU(19000*log(19000)*tidb_cpu_factor(49.9)))) + (topMem(19000*548*tidb_mem_factor(0.2))) | root | | test.i.object_id, offset:18000, count:1000 |
| └─HashJoin_30 | 19000.00 | 3949992.82 | (cpu(10*3*tidb_cpu_factor(49.9))) + (((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00)) + (((scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7))) + (net(30000*rowsize(67.00999999999999)*tidb_kv_net_factor(3.96))))/15.00) + ((hashkey(19000*1*tidb_cpu_factor(49.9))) + (hashmem(19000*16*tidb_mem_factor(0.2))) + (hashbuild(19000*tidb_cpu_factor(49.9)))) + (cpu(19000*filters(0)*tidb_cpu_factor(49.9))) + (((cpu(30000*filters(0)*tidb_cpu_factor(49.9))) + ((hashkey(30000*1*tidb_cpu_factor(49.9))) + (hashprobe(30000*tidb_cpu_factor(49.9)))))/5.00) | root | | left outer join, equal:[eq(test.i.object_id, test.d.object_id)] |
| ├─Limit_35(Build) | 19000.00 | 368179.43 | ((((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00) | root | | offset:0, count:19000 |
| │ └─IndexReader_48 | 19000.00 | 368179.43 | (((scan(19000*logrowsize(48)*tikv_scan_factor(40.7)))) + (net(19000*rowsize(16)*tidb_kv_net_factor(3.96))))/15.00 | root | | index:Limit_47 |
| │ └─Limit_47 | 19000.00 | 4318851.50 | (scan(19000*logrowsize(48)*tikv_scan_factor(40.7))) | cop[tikv] | | offset:0, count:19000 |
| │ └─IndexRangeScan_46 | 19000.00 | 4318851.50 | scan(19000*logrowsize(48)*tikv_scan_factor(40.7)) | cop[tikv] | table:i, index:i_3(cid, object_id) | range:[249,249], keep order:true |
| └─TableReader_52(Probe) | 30000.00 | 1024516.39 | ((scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7))) + (net(30000*rowsize(67.00999999999999)*tidb_kv_net_factor(3.96))))/15.00 | root | | data:TableFullScan_51 |
| └─TableFullScan_51 | 30000.00 | 7406957.80 | scan(30000*logrowsize(67.00999999999999)*tikv_scan_factor(40.7)) | cop[tikv] | table:d | keep order:false |
+-------------------------------------+----------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------------------------------+-----------------------------------------------------------------+
9 rows in set (0.00 sec)
For INL_JOIN, table range scan of d
should have estRows
of 1000, instead of 19000
For INL_JOIN, table range scan of
d
should haveestRows
of 1000, instead of 19000
@mzhang77 Oh, I got your point.
From the explain result, when we push a top-n down to the outer join's outer child. We'll remove the offset and make a full TopN Maybe this part can be improved. We'll check it later.
@mzhang77 Some explanation about why we remove the offset when pushing it down: The original TopN is applied after the join. One row from the outer join's outer child might match more than one row from the inner side. So we can only push down a TopN(0, 19000) to the outer child. We don't know which row from the outer side will produce the 1001-th join result. Due to this reason, we also don't know how many rows the inner side needs output to make the join output the 19000-th join result.
So currently, both sides will use 19000 rows to calculate the cost.
Oh, but I noticed that your join key from the inner side is the primary key, so this can ensure each match of the outer side is exactly one. So the total execution logic can be improved maybe.
If the join key doesn't have the unique property, i'm afraid that the execution logic and the estimation logic will not change.
@winoros This is a real case from a paid customer. Actually in their application all join key has unique property.
@winoros This is a real case from a paid customer. Actually in their application all join key has unique property.
i know. i just want to explain the current situation and the possible improvment tidb can do.
Bug Report
1. Minimal reproduce step (Required)
mysql> show create table d\G 1. row Table: d Create Table: CREATE TABLE
d
(mid
bigint(20) unsigned NOT NULL,object_id
bigint(20) unsigned NOT NULL AUTO_INCREMENT,ov
longblob DEFAULT NULL,version
int(11) unsigned NOT NULL, PRIMARY KEY (object_id
) /T![clustered_index] CLUSTERED / ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci AUTO_INCREMENT=23924932 /T![auto_id_cache] AUTO_ID_CACHE=1 / 1 row in set (0.00 sec)mysql> show create table i\G 1. row Table: i Create Table: CREATE TABLE
i
(object_id
bigint(20) unsigned NOT NULL,lid
bigint(20) DEFAULT NULL,sid
varbinary(767) DEFAULT NULL,cid
bigint(20) DEFAULT NULL,version
int(11) unsigned NOT NULL, PRIMARY KEY (object_id
) /T![clustered_index] CLUSTERED /, KEYi_1
(cid
,sid
,object_id
), KEYi_2
(lid
,object_id
), KEYi_3
(cid
,object_id
) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci 1 row in set (0.00 sec)mysql> explain SELECT ->
d
.* -> FROM ->i
-> LEFT JOINd
ONi
.object_id
=d
.object_id
-> WHERE ->i
.cid
= 249 -> ORDER BY ->i
.object_id
-> LIMIT 1000 OFFSET 18000;2. What did you expect to see? (Required)
The execution plan should use table
i
as driving table to access tabled
3. What did you see instead (Required)
The problem is captured by slow log, it is not reproducible any more.
The issue with this plan is, there is no need to full table scan
d
. Optimizer should offset 18000 rows fromi
, take the next 1000 rows, and join them to tabled
. So maybe full table scani
is necessary, but full table scand
to do a merge join is unnecessary.4. What is your TiDB version? (Required)
v7.5.1