pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.88k stars 5.8k forks source link

How to optimize the simple aggregation SQL ? #9208

Closed vkingnew closed 4 years ago

vkingnew commented 5 years ago

Feature Request

Is your feature request related to a problem? Please describe:

select tidb_version()\G
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Release Version: v2.1.1                                                                                                                                                                                                                                                                                               
Git Commit Hash: 990f859384b834fe5edf1410d2a4142def5c3fa9                                                                                                                                                                                                                                                             
Git Branch: release-2.1                                                                                                                                                                                                                                                                                               
UTC Build Time: 2018-12-12 10:57:44                                                                                                                                                                                                                                                                                   
GoVersion: go version go1.11.2 linux/amd64                                                                                                                                                                                                                                                                            
Race Enabled: false                                                                                                                                                                                                                                                                                                   
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e                                                                                                                                                                                                                                              
Check Table Before Drop: false 

server machince: SSD 2T 3 tidb 3 tikv (12 tikv instance) 2 pd

the table have 110293676 record.

A big table with 1.6 billion rows,how optime it ? just a simple aggregation,for example:

SELECT
  DATE_FORMAT(create_time,'%Y-%m-%d'),s.app_id,'' app_type,s.app_version,'' page_id,'' city_id,
  COUNT(s.id),
  COUNT(DISTINCT IFNULL(s.user_id,s.uuid)) users,
  SUM(IFNULL(s.page_duration,0))
FROM track.ods_online_log s
  WHERE s.create_time >=DATE_SUB(CURDATE(),INTERVAL 3 DAY)
GROUP BY DATE_FORMAT(create_time,'%Y-%m-%d'),s.app_id,s.app_version

--indexes:

Table           Non_unique  Key_name                     Seq_in_index  Column_name  Collation  Cardinality  Sub_part  Packed  Null    Index_type  Comment  Index_comment  
--------------  ----------  ---------------------------  ------------  -----------  ---------  -----------  --------  ------  ------  ----------  -------  ---------------
ods_online_log           0  PRIMARY                                 1  id           A                    0    (NULL)  (NULL)          BTREE                               
ods_online_log           1  idx_logDay                              1  data_time    A                    0    (NULL)  (NULL)  YES     BTREE                               
ods_online_log           1  ix_createtime                           1  create_time  A                    0    (NULL)  (NULL)  YES     BTREE                               
ods_online_log           1  ix_createtime_appid_version             1  create_time  A                    0    (NULL)  (NULL)  YES     BTREE                               
ods_online_log           1  ix_createtime_appid_version             2  app_id       A                    0    (NULL)  (NULL)  YES     BTREE                               
ods_online_log           1  ix_createtime_appid_version             3  app_version  A                    0    (NULL)  (NULL)  YES     BTREE     

the explain:

id                      count        task    operator info                                                                                                                                                                                                                                    
----------------------  -----------  ------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Projection_5            278754.99    root    date_format(s.create_time, "%Y-%m-%d"), s.app_id, "", s.app_version, "", "", 3_col_0, 3_col_1, 3_col_2                                                                                                                                           
└─HashAgg_7             278754.99    root    group by:date_format(s.create_time, "%Y-%m-%d"), s.app_id, s.app_version, funcs:count(s.id), count(distinct ifnull(s.user_id, s.uuid)), sum(ifnull(s.page_duration, "0")), firstrow(s.app_id), firstrow(s.app_version), firstrow(s.create_time)  
  └─IndexLookUp_18      13661552.59  root                                                                                                                                                                                                                                                     
    ├─IndexScan_14      13661552.59  cop     table:s, index:create_time, range:[2019-01-26 00:00:00,+inf], keep order:false                                                                                                                                                                   
    └─TableScan_15      13661552.59  cop     table:ods_online_log, keep order:false                                                                                                                                                                                                           

and it's hard to create index:(too long) CREATE INDEX ix_createtime_appid_version ON track.ods_online_log(create_time,app_id,app_version)

执行耗时 : 29 min 18 sec

current optimize method: 1.single column index 2.dual column index 3.analyze table xx to collect statistics

any other methods to optimize the simple SQL to make it run within 3s?

Describe the feature you'd like:

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

vkingnew commented 5 years ago

run the above SQL waste 46s, How to reduce it to 3s? we predict the table will be hold 10billion records。

alivxxx commented 5 years ago

Thanks for your feedback! Could you post the result of explain analyze SELECT DATE_FORMAT(create_time,'%Y-%m-%d'),s.app_id,'' app_type,s.app_version,'' page_id,'' city_id, COUNT(s.id), COUNT(DISTINCT IFNULL(s.user_id,s.uuid)) users, SUM(IFNULL(s.page_duration,0)) FROM track.ods_online_log s WHERE s.create_time >=DATE_SUB(CURDATE(),INTERVAL 3 DAY) GROUP BY DATE_FORMAT(create_time,'%Y-%m-%d'),s.app_id,s.app_version

vkingnew commented 5 years ago
id                      count        task    operator info                                                                                                                                                    execution info                                  
----------------------  -----------  ------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------  ------------------------------------------------
Projection_5            279805.25    root    date_format(s.create_time, "%Y-%m-%d"), s.app_id, "", s.app_version, "", "", 3_col_0, 3_col_1, 3_col_2                                                           time:44.531143556s, loops:2, rows:32            
└─HashAgg_7             279805.25    root    group by:group_0, group_1, group_2, funcs:count(count_0), count(distinct count_0), sum(sum_0), firstrow(firstrow_0), firstrow(firstrow_0), firstrow(firstrow_0)  time:44.530973193s, loops:2, rows:32            
  └─IndexLookUp_18      13713025.06  root                                                                                                                                                                     time:7.065943833s, loops:448438, rows:14349979  
    ├─IndexScan_14      13713025.06  cop     table:s, index:create_time, range:[2019-01-26 00:00:00,+inf], keep order:false                                                                                                                                   
    └─TableScan_15      13713025.06  cop     table:ods_online_log, keep order:false                                                                                                                                                                           

the time waste on group by statement.

alivxxx commented 5 years ago

I think it is caused by the unparalleled hash agg. PTAL @XuHuaiyu

shenli commented 5 years ago

@vkingnew Could you show us the result of show variables like "%tidb_%_concurrency";?

XuHuaiyu commented 5 years ago

@vkingnew Can you provide us the schema of table:ods_online_log?

XuHuaiyu commented 5 years ago

And the result of show variables like "%tidb%size%"

zz-jason commented 5 years ago

count(distinct count_0) can not be executed parallelly.

vkingnew commented 5 years ago

show variables like "%tidb_%_concurrency"

Variable_name Value


tidb_hashagg_partial_concurrency 4
tidb_index_serial_scan_concurrency 1
tidb_hash_join_concurrency 5
tidb_hashagg_final_concurrency 4
tidb_distsql_scan_concurrency 15
tidb_index_lookup_concurrency 4
tidb_index_lookup_join_concurrency 4
tidb_projection_concurrency 4
tidb_checksum_table_concurrency 4
tidb_build_stats_concurrency 4

SHOW VARIABLES LIKE "%tidb%size%"; Variable_name Value


tidb_max_chunk_size 32
tidb_index_lookup_size 20000
tidb_index_join_batch_size 25000
tidb_dml_batch_size 20000
tidb_ddl_reorg_batch_size 1024

show create table ods_online_log\G 1. row Table: ods_online_log Create Table: CREATE TABLE ods_online_log ( id bigint(20) NOT NULL COMMENT 'id', app_id varchar(100) NOT NULL COMMENT 'ID', page_id varchar(100) NOT NULL COMMENT 'pageID', page_url varchar(500) DEFAULT NULL COMMENT '', data_time int(11) DEFAULT NULL, log_type int(4) NOT NULL COMMENT '', device_id varchar(100) DEFAULT NULL COMMENT '', device_type varchar(100) DEFAULT NULL COMMENT ''', os varchar(100) DEFAULT NULL COMMENT '', os_version varchar(100) DEFAULT NULL COMMENT '', ip varchar(100) DEFAULT NULL COMMENT '', browser_type varchar(100) DEFAULT NULL COMMENT '', browser_version varchar(100) DEFAULT NULL COMMENT '', app_type varchar(100) DEFAULT NULL COMMENT '', app_version varchar(100) DEFAULT NULL COMMENT '', city_id int(10) NOT NULL COMMENT '', user_id varchar(100) DEFAULT NULL COMMENT '', uuid varchar(100) DEFAULT NULL COMMENT '', request_time varchar(100) NOT NULL COMMENT '', session_id varchar(100) NOT NULL COMMENT '', ref_page_id varchar(100) DEFAULT NULL COMMENT '', first_request_flag int(4) DEFAULT NULL COMMENT '', last_request_flag int(4) DEFAULT NULL COMMENT '', url_request_seq_num int(11) DEFAULT NULL COMMENT '', business text DEFAULT NULL COMMENT '', create_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, update_time timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP, page_duration varchar(128) DEFAULT NULL COMMENT '', page_tag varchar(128) DEFAULT NULL COMMENT '', PRIMARY KEY (id), KEY idx_logDay (data_time), KEY ix_createtime (create_time), KEY ix_createtime_appid_version (create_time,app_id,app_version) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci +---------------------+--------------+------+------+-------------------+-----------------------------+

RUN environment:

dmidecode|grep "System Information" -A9|egrep "Manufacturer|Product"

    Manufacturer: HP
    Product Name: ProLiant DL380 Gen9

cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c

 40  Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz

cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l

2

cat /proc/cpuinfo| grep "cpu cores"| uniq

cpu cores : 10

cat /proc/cpuinfo| grep "processor"| wc -l

40 @lamxTyler @shenli @XuHuaiyu @zz-jason

XuHuaiyu commented 5 years ago

we need the result of show create table ods_online_log;

XuHuaiyu commented 5 years ago

What's the number of the CPU cores of the machines where tidb-server runs on?

XuHuaiyu commented 5 years ago

Since the SQL contains a count(distinct count_0), for now, the aggregation cannot be pushed down to TiKV to distributedly execute, and the HashAggExec cannot execute parallelly, either.

The index ix_createtime_appid_version equals to create_time for this SQL since create_time is wrapped by a function(GROUP BY DATE_FORMAT(create_time,'%Y-%m-%d'),s.app_id,s.app_version). StreamAgg would not be chosen.

So, it cannot be optimized to finish in 3s with TiDB for now.

The following suggestions may help to accelerate the SQL to some extent:

  1. set @tidb_distsql_scan_concurrency=40;
  2. set @tidb_index_lookup_concurrency=40;
  3. set @tidb_projection_concurrency=40;
vkingnew commented 5 years ago

@XuHuaiyu In deed the index ix_createtime_appid_version equals to create_time,just for test. if set the concurrency equals to CPU processor nums,may lead to high load for The Server. How about optimizing it by TiSpark? just reduce the run time. I tryed many times by execute sql,about 45s,if reduce the time may better.

XuHuaiyu commented 5 years ago
  1. Yep,but you may try to turn up the values of the 3 variables to see whether the execution time can be reduced.
  2. TiSpark may run fast for this SQL, but it needs an individual Spark cluster. I'm not sure whether this fits your problem. If you wanna try TiSpark, this can be referred.
vkingnew commented 5 years ago

@XuHuaiyu these optimizer parameters can be writen into the configure file:/home/tidb/tidb-ansible/conf/tidb.conf to take it effect,not just a session parametes.?

XuHuaiyu commented 5 years ago

Hi, @vkingnew TiDB would not put the xx_concurrency in the configure file at present. You can set the global system vars to avoid setting them again for every session.

e.g.

set @@tidb_distsql_scan_concurrency=40;
ghost commented 4 years ago

I believe this has been improved in recent versions of TiDB (and the hashagg appears to run in parallel). Testcase:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
 id bigint(20) NOT NULL COMMENT 'id' auto_increment,
 app_id varchar(100) NOT NULL COMMENT 'ID',
 page_id varchar(100) NOT NULL COMMENT 'pageID',
 page_url varchar(500) DEFAULT NULL COMMENT '',
 data_time int(11) DEFAULT NULL,
 log_type int(4) NOT NULL COMMENT '',
 device_id varchar(100) DEFAULT NULL COMMENT '',
 device_type varchar(100) DEFAULT NULL COMMENT '',
 os varchar(100) DEFAULT NULL COMMENT '',
 os_version varchar(100) DEFAULT NULL COMMENT '',
 ip varchar(100) DEFAULT NULL COMMENT '',
 browser_type varchar(100) DEFAULT NULL COMMENT '',
 browser_version varchar(100) DEFAULT NULL COMMENT '',
 app_type varchar(100) DEFAULT NULL COMMENT '',
 app_version varchar(100) DEFAULT NULL COMMENT '',
 city_id int(10) NOT NULL COMMENT '',
 user_id varchar(100) DEFAULT NULL COMMENT '',
 uuid varchar(100) DEFAULT NULL COMMENT '',
 request_time varchar(100) NOT NULL COMMENT '',
 session_id varchar(100) NOT NULL COMMENT '',
 ref_page_id varchar(100) DEFAULT NULL COMMENT '',
 first_request_flag int(4) DEFAULT NULL COMMENT '',
 last_request_flag int(4) DEFAULT NULL COMMENT '',
 url_request_seq_num int(11) DEFAULT NULL COMMENT '',
 business text DEFAULT NULL COMMENT '',
 create_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
 update_time timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
 page_duration varchar(128) DEFAULT NULL COMMENT '',
 page_tag varchar(128) DEFAULT NULL COMMENT '',
 PRIMARY KEY (id),
 KEY idx_logDay (data_time),
 KEY ix_createtime (create_time),
 KEY ix_createtime_appid_version (create_time,app_id,app_version)
);

INSERT INTO t1
 SELECT 
 NULL, HEX(RANDOM_BYTES(32)), HEX(RANDOM_BYTES(32)), HEX(RANDOM_BYTES(100)),
 1234, 1234, 'abcd', 'abcd', 'os', 'os version', 'ip address', 'browser type', 'browser version', 
 'app type', HEX(RANDOM_BYTES(32)), 1234, HEX(RANDOM_BYTES(32)), UUID(), 
 UNIX_TIMESTAMP(),
 HEX(RANDOM_BYTES(32)), HEX(RANDOM_BYTES(32)), 1, 1, 1, 'business text', 
 NOW() - INTERVAL FLOOR(RAND()*86400*3) SECOND,
 NOW() - INTERVAL FLOOR(RAND()*86400*3) SECOND,
 RAND()*1000,
 'page tag' FROM dual;

# Insert ~1M rows

for i in `seq 1 100`; do 
 mysql test -e "INSERT INTO t1
 SELECT 
 NULL, HEX(RANDOM_BYTES(32)), HEX(RANDOM_BYTES(32)), HEX(RANDOM_BYTES(100)),
 1234, 1234, 'abcd', 'abcd', 'os', 'os version', 'ip address', 'browser type', 'browser version', 
 'app type', HEX(RANDOM_BYTES(32)), 1234, HEX(RANDOM_BYTES(32)), UUID(), 
 UNIX_TIMESTAMP(),
 HEX(RANDOM_BYTES(32)), HEX(RANDOM_BYTES(32)), 1, 1, 1, 'business text', 
 NOW() - INTERVAL FLOOR(RAND()*86400*3) SECOND,
 NOW() - INTERVAL FLOOR(RAND()*86400*3) SECOND,
 RAND()*1000,
 'page tag'
FROM t1 a JOIN t1 b JOIN t1 c LIMIT 10000";
 echo -n ".";
done;

SELECT SLEEP(1);
ANALYZE TABLE t1;

When I execute in tidb 2.1.5, it takes about 1.82 seconds:

tidb2.1> EXPLAIN ANALYZE SELECT   DATE_FORMAT(create_time,'%Y-%m-%d'),   app_id,   app_version,   COUNT(id),   COUNT(DISTINCT IFNULL(user_id,uuid)) users,   SUM(IFNULL(page_duration,0)) FROM t1 WHERE create_time  >= DATE_SUB(CURDATE(),INTERVAL 3 DAY) GROUP BY DATE_FORMAT(create_time,'%Y-%m-%d'),app_id,app_version;
+---------------------------+------------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
| id                        | count      | task | operator info                                                                                                                                                                                                                            | execution info                              |
+---------------------------+------------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
| Projection_5              | 968808.00  | root | date_format(test.t1.create_time, "%Y-%m-%d"), test.t1.app_id, test.t1.app_version, 3_col_0, 3_col_1, 3_col_2                                                                                                                             | time:1.818824209s, loops:14, rows:398       |
| └─HashAgg_6               | 968808.00  | root | group by:col_6, col_7, col_8, funcs:count(col_0), count(distinct col_1), sum(col_2), firstrow(col_3), firstrow(col_4), firstrow(col_5)                                                                                                   | time:1.818591973s, loops:14, rows:398       |
|   └─Projection_28         | 1211010.00 | root | test.t1.id, ifnull(test.t1.user_id, test.t1.uuid), cast(ifnull(test.t1.page_duration, "0")), test.t1.app_id, test.t1.app_version, test.t1.create_time, date_format(test.t1.create_time, "%Y-%m-%d"), test.t1.app_id, test.t1.app_version | time:708.572663ms, loops:30346, rows:971010 |
|     └─TableReader_13      | 1211010.00 | root | data:Selection_10                                                                                                                                                                                                                        | time:1.0724969s, loops:30346, rows:971010   |
|       └─Selection_10      | 1211010.00 | cop  | ge(test.t1.create_time, 2020-07-31)                                                                                                                                                                                                      |                                             |
|         └─TableScan_9     | 1211010.00 | cop  | table:t1, range:[-inf,+inf], keep order:false                                                                                                                                                                                            |                                             |
+---------------------------+------------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------+
6 rows in set (1.82 sec)

The same query in master is 0.94 seconds:

> EXPLAIN ANALYZE SELECT   DATE_FORMAT(create_time,'%Y-%m-%d'),   app_id,   app_version,   COUNT(id),   COUNT(DISTINCT IFNULL(user_id,uuid)) users,   SUM(IFNULL(page_duration,0)) FROM t1 WHERE create_time  >= DATE_SUB(CURDATE(),INTERVAL 3 DAY) GROUP BY DATE_FORMAT(create_time,'%Y-%m-%d'),app_id,app_version;
+-------------------------------+------------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+------+
| id                            | estRows    | actRows | task      | access object | execution info                                                                                                                                                           | operator info                                                                                                                                                                                                                                                                                       | memory                | disk |
+-------------------------------+------------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+------+
| Projection_5                  | 97832.00   | 397     | root      |               | time:940.624254ms, loops:2, Concurrency:5                                                                                                                                | date_format(test.t1.create_time, %Y-%m-%d)->Column#33, test.t1.app_id, test.t1.app_version, Column#30, Column#31, Column#32                                                                                                                                                                         | 100.216796875 KB      | N/A  |
| └─HashAgg_6                   | 97832.00   | 397     | root      |               | time:940.368456ms, loops:2                                                                                                                                               | group by:Column#41, Column#42, Column#43, funcs:count(Column#35)->Column#30, funcs:count(distinct Column#36)->Column#31, funcs:sum(Column#37)->Column#32, funcs:firstrow(Column#38)->test.t1.app_id, funcs:firstrow(Column#39)->test.t1.app_version, funcs:firstrow(Column#40)->test.t1.create_time | 480.1513671875 KB     | N/A  |
|   └─Projection_16             | 1031010.00 | 971010  | root      |               | time:112.533319ms, loops:953, Concurrency:5                                                                                                                              | test.t1.id, ifnull(test.t1.user_id, test.t1.uuid)->Column#36, cast(ifnull(test.t1.page_duration, 0), double BINARY)->Column#37, test.t1.app_id, test.t1.app_version, test.t1.create_time, date_format(test.t1.create_time, %Y-%m-%d)->Column#41, test.t1.app_id, test.t1.app_version                | 3.425243377685547 MB  | N/A  |
|     └─TableReader_9           | 1031010.00 | 971010  | root      |               | time:132.53585ms, loops:953, rpc num: 10, rpc max:375.812392ms, min:108.130893ms, avg:280.077689ms, p80:375.799552ms, p95:375.812392ms, proc keys max:108221, p95:108221 | data:Selection_8                                                                                                                                                                                                                                                                                    | 248.74689292907715 MB | N/A  |
|       └─Selection_8           | 1031010.00 | 971010  | cop[tikv] |               | proc max:192ms, min:80ms, p80:192ms, p95:192ms, iters:992, tasks:10                                                                                                      | ge(test.t1.create_time, 2020-07-31)                                                                                                                                                                                                                                                                 | N/A                   | N/A  |
|         └─TableFullScan_7     | 1031010.00 | 971010  | cop[tikv] | table:t1      | proc max:172ms, min:56ms, p80:164ms, p95:172ms, iters:992, tasks:10                                                                                                      | keep order:false                                                                                                                                                                                                                                                                                    | N/A                   | N/A  |
+-------------------------------+------------+---------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+------+
6 rows in set (0.94 sec)

I am using the defaults for concurrency.

I am going to close this issue as stale, but please feel free to re-open if you have additional questions. Thanks!