pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.88k stars 5.8k forks source link

How to be push down to TiKV by use Hash Aggregation. #10960

Closed Danier-Evens closed 3 months ago

Danier-Evens commented 5 years ago

TiDB version

image

TiDB Cluster topology

TiDB 8CPU 16G 100G PD 5CPU 8G 200G TiKV 8CPU 32G 200G(SDD)

Question

I use this cluster test a single table aggregation on a 2kw+ table,view the execution plan, the aggregation method is StreamAgg,TiDB documentation have write "hash aggregation will push down tikv to improve the concurrency and reduce the network load.",but I don't understand "If Hash Aggregation is close to the read operator of Table or Index" what does it mean and I don't know how to make my sql to use has agg. my sql execution is slow (3min+), performance bottlenecks are aggregated on the tidb node. Can you help me solve it? I have already set concurrency like set @@session.tidb_distsql_scan_concurrency=40; set @@session.tidb_index_lookup_concurrency=40;

image

table count image

single aggregation scan table total count image

SQL

SELECT `datestr` AS dimension_19700,
    sum((case when typeid='CHENIU_SOUSUO_GUANJIANZI' then pv else 0 end)) AS measure_35041,
    sum((case when typeid='CHENIU_SOUSUO_GUANJIANZI' then uv else 0 end)) AS measure_35042,
    sum((case when typeid='CHENIU_SOUSUO_CHEXI' then pv else 0 end)) AS measure_35043,
    sum((case when typeid='CHENIU_SOUSUO_CHEXI' then uv else 0 end)) AS measure_35044,
    sum((case when typeid='CHENIU_SOUSUO_DIQU' then uv else 0 end)) AS measure_35045,
    sum((case when typeid='CHENIU_SOUSUO_CHELING' then pv else 0 end)) AS measure_35046,
    sum((case when typeid='CHENIU_SOUSUO_CHELING' then uv else 0 end)) AS measure_35047,
    sum((case when typeid='CHENIU_SOUSUO_JIAGE' then pv else 0 end)) AS measure_35048,
    sum((case when typeid='CHENIU_SOUSUO_JIAGE' then uv else 0 end)) AS measure_35049,
    sum((case when typeid='CHENIU_SOUSUO_LICHENG' then pv else 0 end)) AS measure_35050,
    sum((case when typeid='CHENIU_SOUSUO_LICHENG' then uv else 0 end)) AS measure_35051,
    sum((case when typeid='CHENIU_SOUSUO_YANSE' then pv else 0 end)) AS measure_35052,
    sum((case when typeid='CHENIU_SOUSUO_YANSE' then uv else 0 end)) AS measure_35053,
    sum((case when typeid='CHENIU_SOUSUO_PAIFANG' then pv else 0 end)) AS measure_35054,
    sum((case when typeid='CHENIU_SOUSUO_PAIFANG' then uv else 0 end)) AS measure_35055,
    sum((case when typeid='CHENIU_SOUSUO_CHANDI' then pv else 0 end)) AS measure_35056,
    sum((case when typeid='CHENIU_SOUSUO_CHANDI' then uv else 0 end)) AS measure_35057,
    sum((case when typeid='CHENIU_SOUSUO_BIANSUXIANG' then pv else 0 end)) AS measure_35058,
    sum((case when typeid='CHENIU_SOUSUO_LEIXING' then pv else 0 end)) AS measure_35059,
    sum((case when typeid='CHENIU_SOUSUO_LEIXING' then uv else 0 end)) AS measure_35060,
    sum((case when typeid='CHENIU_SOUSUO_PAIXU' then pv else 0 end)) AS measure_35061,
    sum((case when typeid='CHENIU_SOUSUO_PAIXU' then uv else 0 end)) AS measure_35062,
    sum((case when typeid='CHENIU_SOUSUO_TWO' then uv else 0 end)) AS measure_35063,
    sum((case when typeid='CHENIU_SOUSUO_THREE' then pv else 0 end)) AS measure_35064,
    sum((case when typeid='CHENIU_SOUSUO_THREE' then uv else 0 end)) AS measure_35065,
    sum((case when typeid='CHENIU_SOUSUO_FOUR' then pv else 0 end)) AS measure_35066,
    sum((case when typeid='CHENIU_SOUSUO_FOUR' then uv else 0 end)) AS measure_35067
FROM rpt_dm_biz_trackmodel_count_dd
WHERE (`datestr` >= "2019-03-24" AND `datestr` <= "2019-06-21")
GROUP BY dimension_19700
ORDER BY `datestr` desc
LIMIT 1000;

EXPLAIN

| id             | parents      | children       | task | operator info                                                                                                  | count   |

| IndexScan_82   |              |                | cop  | table:rpt_dm_biz_trackmodel_count_dd, index:datestr, range:[2019-03-24,2019-06-21], keep order:true, desc      | 1249.69 |
| TableScan_83   |              |                | cop  | table:rpt_dm_biz_trackmodel_count_dd, keep order:false                                                         | 1249.69 |
| IndexLookUp_84 | StreamAgg_67 |                | root | index:IndexScan_82, table:TableScan_83                                                                         |1249.69 |
| StreamAgg_67   | Limit_15     | IndexLookUp_84 | root | group by:reportdb.rpt_dm_biz_trackmodel_count_dd.datestr, funcs:sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_GUANJIANZI), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_GUANJIANZI), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_CHEXI), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_CHEXI), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_DIQU), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_CHELING), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_CHELING), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_JIAGE), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_JIAGE), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_LICHENG), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_LICHENG), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_YANSE), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_YANSE), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_PAIFANG), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_PAIFANG), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_CHANDI), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_CHANDI), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_BIANSUXIANG), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_LEIXING), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_LEIXING), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_PAIXU), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_PAIXU), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_TWO), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_THREE), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_THREE), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_FOUR), reportdb.rpt_dm_biz_trackmodel_count_dd.pv, 0)), sum(case(eq(reportdb.rpt_dm_biz_trackmodel_count_dd.typeid, CHENIU_SOUSUO_FOUR), reportdb.rpt_dm_biz_trackmodel_count_dd.uv, 0)), firstrow(reportdb.rpt_dm_biz_trackmodel_count_dd.datestr)                                                                                                                           |1000.00 |
| Limit_15       | Projection_9 | StreamAgg_67   | root | offset:0, count:1000                                                                                             | 1000.00 |
| Projection_9   |              | Limit_15       | root | reportdb.rpt_dm_biz_trackmodel_count_dd.dimension_19700, measure_35041, measure_35042, measure_35043, measure_35044, measure_35045, measure_35046, measure_35047, measure_35048, measure_35049, measure_35050, measure_35051, measure_35052, measure_35053, measure_35054, measure_35055, measure_35056, measure_35057, measure_35058, measure_35059, measure_35060, measure_35061, measure_35062, measure_35063, measure_35064, measure_35065, measure_35066, measure_35067  

Table DDL

CREATE TABLE  if not exists `rpt_dm_biz_trackmodel_count_dd` (
  `datestr` date DEFAULT NULL ,
  `typeid` varchar(250) DEFAULT 'none' ,
  `pv` int(11) DEFAULT '0',
  `uv` int(11) DEFAULT '0',
  `last_update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ,
  UNIQUE KEY `type_date_index` (`typeid`,`datestr`),
  KEY `datestr` (`datestr`),
  KEY `typeid` (`typeid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

System Variables

mysql> show variables like "%tidb_%_concurrency";
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| tidb_index_lookup_join_concurrency | 4     |
| tidb_hash_join_concurrency         | 5     |
| tidb_index_lookup_concurrency      | 40    |
| tidb_build_stats_concurrency       | 4     |
| tidb_distsql_scan_concurrency      | 40    |
| tidb_index_serial_scan_concurrency | 1     |
| tidb_checksum_table_concurrency    | 4     |
+------------------------------------+-------+
7 rows in set (0.01 sec)

mysql> show variables like "%tidb%size%";
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| tidb_dml_batch_size        | 20000 |
| tidb_index_lookup_size     | 20000 |
| tidb_index_join_batch_size | 25000 |
| tidb_max_chunk_size        | 1024  |
+----------------------------+-------+
4 rows in set (0.00 sec)
ljluestc commented 1 year ago
SELECT `datestr` AS dimension_19700,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_GUANJIANZI' THEN pv ELSE 0 END)) AS measure_35041,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_GUANJIANZI' THEN uv ELSE 0 END)) AS measure_35042,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_CHEXI' THEN pv ELSE 0 END)) AS measure_35043,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_CHEXI' THEN uv ELSE 0 END)) AS measure_35044,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_DIQU' THEN uv ELSE 0 END)) AS measure_35045,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_CHELING' THEN pv ELSE 0 END)) AS measure_35046,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_CHELING' THEN uv ELSE 0 END)) AS measure_35047,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_JIAGE' THEN pv ELSE 0 END)) AS measure_35048,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_JIAGE' THEN uv ELSE 0 END)) AS measure_35049,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_LICHENG' THEN pv ELSE 0 END)) AS measure_35050,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_LICHENG' THEN uv ELSE 0 END)) AS measure_35051,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_YANSE' THEN pv ELSE 0 END)) AS measure_35052,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_YANSE' THEN uv ELSE 0 END)) AS measure_35053,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_PAIFANG' THEN pv ELSE 0 END)) AS measure_35054,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_PAIFANG' THEN uv ELSE 0 END)) AS measure_35055,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_CHANDI' THEN pv ELSE 0 END)) AS measure_35056,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_CHANDI' THEN uv ELSE 0 END)) AS measure_35057,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_BIANSUXIANG' THEN pv ELSE 0 END)) AS measure_35058,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_LEIXING' THEN pv ELSE 0 END)) AS measure_35059,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_LEIXING' THEN uv ELSE 0 END)) AS measure_35060,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_PAIXU' THEN pv ELSE 0 END)) AS measure_35061,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_PAIXU' THEN uv ELSE 0 END)) AS measure_35062,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_TWO' THEN uv ELSE 0 END)) AS measure_35063,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_THREE' THEN pv ELSE 0 END)) AS measure_35064,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_THREE' THEN uv ELSE 0 END)) AS measure_35065,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_FOUR' THEN pv ELSE 0 END)) AS measure_35066,
    SUM((CASE WHEN typeid='CHENIU_SOUSUO_FOUR' THEN uv ELSE 0 END)) AS measure_35067
FROM rpt_dm_biz_trackmodel_count_dd
WHERE (`datestr` BETWEEN '2019-03-24' AND '2019-06-21')
GROUP BY dimension_19700
ORDER BY `datestr` DESC
LIMIT 1000;

Regarding the EXPLAIN output, it indicates that the query is performing an IndexScan and a StreamAgg operation. To encourage the use of Hash Aggregation, you can follow the suggestions mentioned in the previous response. Adjust the tidb_hash_join_concurrency and tidb_index_lookup_join_concurrency variables, optimize table indexes, rewrite the query if necessary, and analyze table statistics.

jebter commented 3 months ago

You can ask new questions at https://asktug.com/. I will close this issue. If there are any updates, you can reopen it.