Closed wjhuang2016 closed 10 months ago
Run in mysql:
MySQL [test]> select count(1) from tcfc8cb35 where col_85 = ''; +----------+ | count(1) | +----------+ | 6 | +----------+ 1 row in set (0.001 sec)
So the 110 rows should be the correct result.
2. I think the key problem is not collation, because the the rows that incorrect with empty col_85. I have test the case with gbk or not, HashAgg always gets 110 rows and StreamAgg always gets 115 rows.
3. Why StreamAgg can't get the correct result?
We can see the plan for StreamAgg.
tidb> desc analyze SELECT /+ AGG_TO_COP() STREAM_AGG()/ (NOT (tcfc8cb35
.col_85
>=_UTF8MB4'j筧8') OR NOT (tcfc8cb35
.col_85
=_UTF8MB4'暈lH忧ll6')) IS TRUE,MAX(tcfc8cb35
.col_81
) AS r0
,QUOTE(tcfc8cb35
.col_85
) AS r1
FROM tcfc8cb35
WHERE tcfc8cb35
.col_84
>240817 OR tcfc8cb35
.col_85
BETWEEN _UTF8MB4'WVz' AND _UTF8MB4'G#駧褉ZC領*lov' GROUP BY tcfc8cb35
.col_85
;
+------------------------------------+---------+---------+-----------+---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +------------------------------------+---------+---------+-----------+---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+ | Projection_7 | 30.67 | 115 | root | | time:1.7ms, loops:2, RU:1.341432, Concurrency:OFF | istrue(or(not(ge(test.tcfc8cb35.col_85, j筧8)), not(eq(test.tcfc8cb35.col_85, 暈lH忧ll6))))->Column#11, Column#10, quote(test.tcfc8cb35.col_85)->Column#12 | 4.76 KB | N/A | | └─StreamAgg_11 | 30.67 | 115 | root | | time:1.65ms, loops:2 | group by:test.tcfc8cb35.col_85, funcs:max(test.tcfc8cb35.col_81)->Column#10, funcs:firstrow(test.tcfc8cb35.col_85)->test.tcfc8cb35.col_85 | 5.87 KB | N/A | | └─IndexMerge_19 | 38.33 | 115 | root | | time:1.61ms, loops:2, index_task:{fetch_handle:681.274µs, merge:18.815µs}, table_task:{num:1, concurrency:5, fetch_row:820.815µs, wait_time:774.578µs} | type: union | 39.0 KB | N/A | | ├─TableRangeScan_16(Build) | 38.33 | 115 | cop[tikv] | table:tcfc8cb35, partition:p0 | time:666.1µs, loops:3, cop_task: {num: 1, max: 630.5µs, proc_keys: 115, tot_proc: 77.6µs, tot_wait: 201.5µs, rpc_num: 1, rpc_time: 615.2µs, copr_cache_hit_ratio: 0.00, build_task_duration: 5.3µs, max_distsql_concurrency: 1}, tikv_task:{time:1ms, loops:3}, scan_detail: {total_process_keys: 115, total_process_keys_size: 4140, total_keys: 116, get_snapshot_time: 149.2µs, rocksdb: {delete_skipped_count: 115, key_skipped_count: 230, block: {}}} | range:(240817,+inf], keep order:false, stats:partial[col_84:unInitialized, col_85:unInitialized] | 2.10 KB | N/A | | ├─IndexFullScan_17(Build) | 0.00 | 0 | cop[tikv] | table:tcfc8cb35, partition:p0, index:idx_25(col_84, col_85, col_81) | time:6.63µs, loops:1 | keep order:false, stats:partial[col_84:unInitialized, col_85:unInitialized] | N/A | N/A | | └─TableRowIDScan_18(Probe) | 38.33 | 115 | cop[tikv] | table:tcfc8cb35, partition:p0 | time:786.9µs, loops:2, cop_task: {num: 1, max: 752µs, proc_keys: 115, tot_proc: 317.1µs, tot_wait: 57.1µs, rpc_num: 1, rpc_time: 728.5µs, copr_cache_hit_ratio: 0.00, build_task_duration: 4.28µs, max_distsql_concurrency: 1}, tikv_task:{time:0s, loops:3}, scan_detail: {total_process_keys: 115, total_process_keys_size: 12891, total_keys: 230, get_snapshot_time: 9.2µs, rocksdb: {delete_skipped_count: 115, key_skipped_count: 230, block: {}}} | keep order:false, stats:partial[col_84:unInitialized, col_85:unInitialized] | N/A | N/A | +------------------------------------+---------+---------+-----------+---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+ 6 rows in set, 1 warning (0.002 sec)
I think the IndexMerge can't keep the col_85 order, but we don't generate an extra sort to keep col_85 order.
4. Try to avoid index_merge in this case. It get the correct result.
tidb> desc analyze SELECT /+ no_index_merge() AGG_TO_COP() STREAM_AGG()/ (NOT (tcfc8cb35
.col_85
>=_UTF8MB4'j筧8') OR NOT (tcfc8cb35
.col_85
=_UTF8MB4'暈lH忧ll6')) IS TRUE,MAX(tcfc8cb35
.col_81
) AS r0
,QUOTE(tcfc8cb35
.col_85
) AS r1
FROM tcfc8cb35
WHERE tcfc8cb35
.col_84
>240817 OR tcfc8cb35
.col_85
BETWEEN _UTF8MB4'WVz' AND _UTF8MB4'G#駧褉ZC領lov' GROUP BY tcfc8cb35
.col_85
;
+--------------------------------+---------+---------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+--------------------------------+---------+---------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+
| Projection_7 | 30.67 | 110 | root | | time:815.6µs, loops:2, RU:0.723660, Concurrency:OFF | istrue(or(not(ge(test.tcfc8cb35.col_85, j筧8)), not(eq(test.tcfc8cb35.col_85, 暈lH忧ll6))))->Column#11, Column#10, quote(test.tcfc8cb35.col_85)->Column#12 | 11.5 KB | N/A |
| └─StreamAgg_10 | 30.67 | 110 | root | | time:762.9µs, loops:2 | group by:test.tcfc8cb35.col_85, funcs:max(test.tcfc8cb35.col_81)->Column#10, funcs:firstrow(test.tcfc8cb35.col_85)->test.tcfc8cb35.col_85 | 20.6 KB | N/A |
| └─Sort_15 | 38.33 | 115 | root | | time:725.6µs, loops:2 | test.tcfc8cb35.col_85 | 8.41 KB | 0 Bytes |
| └─TableReader_14 | 38.33 | 115 | root | | time:681µs, loops:2, cop_task: {num: 1, max: 671.4µs, proc_keys: 115, tot_proc: 155.9µs, tot_wait: 143.4µs, rpc_num: 1, rpc_time: 651.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 5.97µs, max_distsql_concurrency: 1} | data:Selection_13 | 4.84 KB | N/A |
| └─Selection_13 | 38.33 | 115 | cop[tikv] | | tikv_task:{time:0s, loops:3}, scan_detail: {total_process_keys: 115, total_process_keys_size: 12891, total_keys: 116, get_snapshot_time: 125.4µs, rocksdb: {delete_skipped_count: 115, key_skipped_count: 230, block: {}}} | or(gt(test.tcfc8cb35.col_84, 240817), and(ge(test.tcfc8cb35.col_85, "WVz"), le(test.tcfc8cb35.col_85, "G#駧褉ZC領lov"))) | N/A | N/A |
| └─TableFullScan_12 | 115.00 | 115 | cop[tikv] | table:tcfc8cb35, partition:p0 | tikv_task:{time:0s, loops:3} | keep order:false, stats:partial[col_84:unInitialized, col_85:unInitialized] | N/A | N/A |
+--------------------------------+---------+---------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+
6 rows in set, 1 warning (0.002 sec)
- I have test in mysql and get the result 110 rows.. And we can run the sql following:
Run in mysql: MySQL [test]> select count(1) from tcfc8cb35 where col_85 = ''; +----------+ | count(1) | +----------+ | 6 | +----------+ 1 row in set (0.001 sec)
So the 110 rows should be the correct result.
- I think the key problem is not collation, because the the rows that incorrect with empty col_85. I have test the case with gbk or not, HashAgg always gets 110 rows and StreamAgg always gets 115 rows.
- Why StreamAgg can't get the correct result? We can see the plan for StreamAgg.
tidb> desc analyze SELECT /*+ AGG_TO_COP() STREAM_AGG()*/ (NOT (`tcfc8cb35`.`col_85`>=_UTF8MB4'j筧8') OR NOT (`tcfc8cb35`.`col_85`=_UTF8MB4'暈lH忧ll6')) IS TRUE,MAX(`tcfc8cb35`.`col_81`) AS `r0`,QUOTE(`tcfc8cb35`.`col_85`) AS `r1` FROM `tcfc8cb35` WHERE `tcfc8cb35`.`col_84`>240817 OR `tcfc8cb35`.`col_85` BETWEEN _UTF8MB4'WVz' AND _UTF8MB4'G#駧褉ZC領*lov' GROUP BY `tcfc8cb35`.`col_85`; +------------------------------------+---------+---------+-----------+---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +------------------------------------+---------+---------+-----------+---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+ | Projection_7 | 30.67 | 115 | root | | time:1.7ms, loops:2, RU:1.341432, Concurrency:OFF | istrue(or(not(ge(test.tcfc8cb35.col_85, j筧8)), not(eq(test.tcfc8cb35.col_85, 暈lH忧ll6))))->Column#11, Column#10, quote(test.tcfc8cb35.col_85)->Column#12 | 4.76 KB | N/A | | └─StreamAgg_11 | 30.67 | 115 | root | | time:1.65ms, loops:2 | group by:test.tcfc8cb35.col_85, funcs:max(test.tcfc8cb35.col_81)->Column#10, funcs:firstrow(test.tcfc8cb35.col_85)->test.tcfc8cb35.col_85 | 5.87 KB | N/A | | └─IndexMerge_19 | 38.33 | 115 | root | | time:1.61ms, loops:2, index_task:{fetch_handle:681.274µs, merge:18.815µs}, table_task:{num:1, concurrency:5, fetch_row:820.815µs, wait_time:774.578µs} | type: union | 39.0 KB | N/A | | ├─TableRangeScan_16(Build) | 38.33 | 115 | cop[tikv] | table:tcfc8cb35, partition:p0 | time:666.1µs, loops:3, cop_task: {num: 1, max: 630.5µs, proc_keys: 115, tot_proc: 77.6µs, tot_wait: 201.5µs, rpc_num: 1, rpc_time: 615.2µs, copr_cache_hit_ratio: 0.00, build_task_duration: 5.3µs, max_distsql_concurrency: 1}, tikv_task:{time:1ms, loops:3}, scan_detail: {total_process_keys: 115, total_process_keys_size: 4140, total_keys: 116, get_snapshot_time: 149.2µs, rocksdb: {delete_skipped_count: 115, key_skipped_count: 230, block: {}}} | range:(240817,+inf], keep order:false, stats:partial[col_84:unInitialized, col_85:unInitialized] | 2.10 KB | N/A | | ├─IndexFullScan_17(Build) | 0.00 | 0 | cop[tikv] | table:tcfc8cb35, partition:p0, index:idx_25(col_84, col_85, col_81) | time:6.63µs, loops:1 | keep order:false, stats:partial[col_84:unInitialized, col_85:unInitialized] | N/A | N/A | | └─TableRowIDScan_18(Probe) | 38.33 | 115 | cop[tikv] | table:tcfc8cb35, partition:p0 | time:786.9µs, loops:2, cop_task: {num: 1, max: 752µs, proc_keys: 115, tot_proc: 317.1µs, tot_wait: 57.1µs, rpc_num: 1, rpc_time: 728.5µs, copr_cache_hit_ratio: 0.00, build_task_duration: 4.28µs, max_distsql_concurrency: 1}, tikv_task:{time:0s, loops:3}, scan_detail: {total_process_keys: 115, total_process_keys_size: 12891, total_keys: 230, get_snapshot_time: 9.2µs, rocksdb: {delete_skipped_count: 115, key_skipped_count: 230, block: {}}} | keep order:false, stats:partial[col_84:unInitialized, col_85:unInitialized] | N/A | N/A | +------------------------------------+---------+---------+-----------+---------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+ 6 rows in set, 1 warning (0.002 sec)
I think the IndexMerge can't keep the col_85 order, but we don't generate an extra sort to keep col_85 order.
- Try to avoid index_merge in this case. It get the correct result.
tidb> desc analyze SELECT /*+ no_index_merge() AGG_TO_COP() STREAM_AGG()*/ (NOT (`tcfc8cb35`.`col_85`>=_UTF8MB4'j筧8') OR NOT (`tcfc8cb35`.`col_85`=_UTF8MB4'暈lH忧ll6')) IS TRUE,MAX(`tcfc8cb35`.`col_81`) AS `r0`,QUOTE(`tcfc8cb35`.`col_85`) AS `r1` FROM `tcfc8cb35` WHERE `tcfc8cb35`.`col_84`>240817 OR `tcfc8cb35`.`col_85` BETWEEN _UTF8MB4'WVz' AND _UTF8MB4'G#駧褉ZC領*lov' GROUP BY `tcfc8cb35`.`col_85`; +--------------------------------+---------+---------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +--------------------------------+---------+---------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+ | Projection_7 | 30.67 | 110 | root | | time:815.6µs, loops:2, RU:0.723660, Concurrency:OFF | istrue(or(not(ge(test.tcfc8cb35.col_85, j筧8)), not(eq(test.tcfc8cb35.col_85, 暈lH忧ll6))))->Column#11, Column#10, quote(test.tcfc8cb35.col_85)->Column#12 | 11.5 KB | N/A | | └─StreamAgg_10 | 30.67 | 110 | root | | time:762.9µs, loops:2 | group by:test.tcfc8cb35.col_85, funcs:max(test.tcfc8cb35.col_81)->Column#10, funcs:firstrow(test.tcfc8cb35.col_85)->test.tcfc8cb35.col_85 | 20.6 KB | N/A | | └─Sort_15 | 38.33 | 115 | root | | time:725.6µs, loops:2 | test.tcfc8cb35.col_85 | 8.41 KB | 0 Bytes | | └─TableReader_14 | 38.33 | 115 | root | | time:681µs, loops:2, cop_task: {num: 1, max: 671.4µs, proc_keys: 115, tot_proc: 155.9µs, tot_wait: 143.4µs, rpc_num: 1, rpc_time: 651.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 5.97µs, max_distsql_concurrency: 1} | data:Selection_13 | 4.84 KB | N/A | | └─Selection_13 | 38.33 | 115 | cop[tikv] | | tikv_task:{time:0s, loops:3}, scan_detail: {total_process_keys: 115, total_process_keys_size: 12891, total_keys: 116, get_snapshot_time: 125.4µs, rocksdb: {delete_skipped_count: 115, key_skipped_count: 230, block: {}}} | or(gt(test.tcfc8cb35.col_84, 240817), and(ge(test.tcfc8cb35.col_85, "WVz"), le(test.tcfc8cb35.col_85, "G#駧褉ZC領*lov"))) | N/A | N/A | | └─TableFullScan_12 | 115.00 | 115 | cop[tikv] | table:tcfc8cb35, partition:p0 | tikv_task:{time:0s, loops:3} | keep order:false, stats:partial[col_84:unInitialized, col_85:unInitialized] | N/A | N/A | +--------------------------------+---------+---------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+---------+ 6 rows in set, 1 warning (0.002 sec)
You are right. Maybe I have some mistake database when I test the no collation table. I will go ahead to focus this issue.
I try the another plan with only index reader which will add the sort node under the stream agg when the sort of child plan is not satisfy the requirement.
mysql> create table t1 (k1 int, k2 int);
Query OK, 0 rows affected (0.08 sec)
mysql> alter table t1 add index idxk2_k1 (k2, k1);
Query OK, 0 rows affected (0.56 sec)
mysql> explain select /*+ AGG_TO_COP() STREAM_AGG()*/ k1, max(k2) from t1 where k2 between 1 and 10 group by k1;
+-------------------------------+---------+-----------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+---------+-----------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------+
| Projection_5 | 200.00 | root | | nocollation.t1.k1, Column#4 |
| └─StreamAgg_8 | 200.00 | root | | group by:nocollation.t1.k1, funcs:max(nocollation.t1.k2)->Column#4, funcs:firstrow(nocollation.t1.k1)->nocollation.t1.k1 |
| └─Sort_12 | 250.00 | root | | nocollation.t1.k1 |
| └─IndexReader_11 | 250.00 | root | | index:IndexRangeScan_10 |
| └─IndexRangeScan_10 | 250.00 | cop[tikv] | table:t1, index:idxk2_k1(k2, k1) | range:[1,10], keep order:false, stats:pseudo |
+-------------------------------+---------+-----------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
So, the optimizer should add the sort node under the stream agg if the sort property of child plan is not satisfy the requirement.
Optimizer lost the sort property in somewhere when the under plan contains the index merge.
@AilinKid Please take a look
streamAggregate does enumerated prop{sortItem}, and when we enforce the sort operator for this kind of prop, the judgment of task.plan() function == nil does mean this plan is invalid, especially for indexMergeReader. (we keep index-plan-hasn't finished for the convenience of pushing more LIMIT down to its partial index paths)
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
Two queries get the same results.
3. What did you see instead (Required)
4. What is your TiDB version? (Required)
v7.6.0-alpha-342-g5883c5ba34-dirty