pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.04k stars 5.82k forks source link

The response time is not reasonable when 1 tiflash is hanging #39686

Closed JaySon-Huang closed 1 year ago

JaySon-Huang commented 1 year ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

When all tiflash instances are normal, the response time is about 7.5 sec in my env. After 1 tiflash instance hang, the response time of executing query 41 may slightly degrade to 8 sec or so

3. What did you see instead (Required)

After 1 tiflash instance hang, the response time increase from about 7.5 sec to more than 133 sec

tpc-ds query 41
SELECT Distinct(i_product_name) 
FROM   item i1 
WHERE  i_manufact_id BETWEEN 765 AND 765 + 40 
       AND (SELECT Count(*) AS item_cnt 
            FROM   item 
            WHERE  ( i_manufact = i1.i_manufact 
                     AND ( ( i_category = 'Women' 
                             AND ( i_color = 'dim' 
                                    OR i_color = 'green' ) 
                             AND ( i_units = 'Gross' 
                                    OR i_units = 'Dozen' ) 
                             AND ( i_size = 'economy' 
                                    OR i_size = 'petite' ) ) 
                            OR ( i_category = 'Women' 
                                 AND ( i_color = 'navajo' 
                                        OR i_color = 'aquamarine' ) 
                                 AND ( i_units = 'Case' 
                                        OR i_units = 'Unknown' ) 
                                 AND ( i_size = 'large' 
                                        OR i_size = 'N/A' ) ) 
                            OR ( i_category = 'Men' 
                                 AND ( i_color = 'indian' 
                                        OR i_color = 'dark' ) 
                                 AND ( i_units = 'Oz' 
                                        OR i_units = 'Lb' ) 
                                 AND ( i_size = 'extra large' 
                                        OR i_size = 'small' ) ) 
                            OR ( i_category = 'Men' 
                                 AND ( i_color = 'peach' 
                                        OR i_color = 'purple' ) 
                                 AND ( i_units = 'Tbl' 
                                        OR i_units = 'Bunch' ) 
                                 AND ( i_size = 'economy' 
                                        OR i_size = 'petite' ) ) ) ) 
                    OR ( i_manufact = i1.i_manufact 
                         AND ( ( i_category = 'Women' 
                                 AND ( i_color = 'orchid' 
                                        OR i_color = 'peru' ) 
                                 AND ( i_units = 'Carton' 
                                        OR i_units = 'Cup' ) 
                                 AND ( i_size = 'economy' 
                                        OR i_size = 'petite' ) ) 
                                OR ( i_category = 'Women' 
                                     AND ( i_color = 'violet' 
                                            OR i_color = 'papaya' ) 
                                     AND ( i_units = 'Ounce' 
                                            OR i_units = 'Box' ) 
                                     AND ( i_size = 'large' 
                                            OR i_size = 'N/A' ) ) 
                                OR ( i_category = 'Men' 
                                     AND ( i_color = 'drab' 
                                            OR i_color = 'grey' ) 
                                     AND ( i_units = 'Each' 
                                            OR i_units = 'N/A' ) 
                                     AND ( i_size = 'extra large' 
                                            OR i_size = 'small' ) ) 
                                OR ( i_category = 'Men' 
                                     AND ( i_color = 'chocolate' 
                                            OR i_color = 'antique' ) 
                                     AND ( i_units = 'Dram' 
                                            OR i_units = 'Gram' ) 
                                     AND ( i_size = 'economy' 
                                            OR i_size = 'petite' ) ) ) )) > 0 
ORDER  BY i_product_name
LIMIT 100; 
-- all tiflash instances are normal
mysql tpcds50> source /data1/jaysonhuang/tpcds_41.sql;
+--------------------------------------+-------------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id                                   | estRows     | actRows | task         | access object | execution info                                                                                                                                                                                                                                                                                                                                                          | operator info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | memory    | disk |
+--------------------------------------+-------------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| Projection_16                        | 100.00      | 2       | root         |               | time:7.59s, loops:2, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                    | tpcds50.item.i_product_name                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 1.24 KB   | N/A  |
| └─TopN_19                            | 100.00      | 2       | root         |               | time:7.59s, loops:2                                                                                                                                                                                                                                                                                                                                                     | tpcds50.item.i_product_name, offset:0, count:100                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 2.50 KB   | N/A  |
|   └─HashAgg_24                       | 802.44      | 2       | root         |               | time:7.59s, loops:4, partial_worker:{wall_time:7.589680249s, concurrency:5, task_num:1, tot_wait:37.948143481s, tot_exec:29.041µs, tot_time:37.948189248s, max:7.589647633s, p95:7.589647633s}, final_worker:{wall_time:7.589702459s, concurrency:5, task_num:2, tot_wait:37.948279858s, tot_exec:93.953µs, tot_time:37.948378205s, max:7.589681959s, p95:7.589681959s} | group by:tpcds50.item.i_product_name, funcs:firstrow(tpcds50.item.i_product_name)->tpcds50.item.i_product_name, funcs:firstrow(tpcds50.item.i_product_name)->tpcds50.item.i_product_name                                                                                                                                                                                                                                                                                                                             | 26.0 KB   | N/A  |
|     └─Apply_26                       | 804.33      | 2       | root         |               | time:7.59s, loops:2, Concurrency:OFF, cache:ON, cacheHitRatio:82.174%                                                                                                                                                                                                                                                                                                   | CARTESIAN inner join                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 2.54 KB   | N/A  |
|       ├─TableReader_32(Build)        | 804.33      | 791     | root         |               | time:12.1ms, loops:3, cop_task: {num: 1, max: 12.3ms, proc_keys: 0, rpc_num: 1, rpc_time: 12.2ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                                                                                                                                                                  | data:Selection_31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 43.6 KB   | N/A  |
|       │ └─Selection_31               | 804.33      | 791     | cop[tiflash] |               | tiflash_task:{time:9.53ms, loops:1, threads:1}                                                                                                                                                                                                                                                                                                                          | ge(tpcds50.item.i_manufact_id, 765), le(tpcds50.item.i_manufact_id, 805)                                                                                                                                                                                                                                                                                                                                                                                                                                             | N/A       | N/A  |
|       │   └─TableFullScan_30         | 62000.00    | 62000   | cop[tiflash] | table:i1      | tiflash_task:{time:8.53ms, loops:1, threads:1}                                                                                                                                                                                                                                                                                                                          | keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | N/A       | N/A  |
|       └─Selection_33(Probe)          | 643.46      | 2       | root         |               | time:7.12s, loops:143                                                                                                                                                                                                                                                                                                                                                   | gt(Column#45, 0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 380 Bytes | N/A  |
|         └─HashAgg_55                 | 804.33      | 141     | root         |               | time:7.12s, loops:284, partial_worker:{wall_time:7.109935754s, concurrency:5, task_num:2, tot_wait:35.51418698s, tot_exec:38.455µs, tot_time:35.515316306s, max:75.075552ms, p95:52.605628ms}, final_worker:{wall_time:7.117353059s, concurrency:5, task_num:2, tot_wait:35.540632236s, tot_exec:1.641804ms, tot_time:35.54291714s, max:75.110395ms, p95:52.658243ms}   | funcs:count(Column#48)->Column#45                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 9.86 KB   | N/A  |
|           └─TableReader_57           | 804.33      | 2       | root         |               | time:7.1s, loops:143, cop_task: {num: 143, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}                                                                                                                                                                                                                                                              | data:ExchangeSender_56                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | N/A       | N/A  |
|             └─ExchangeSender_56      | 804.33      | 2       | mpp[tiflash] |               | tiflash_task:{proc max:72.9ms, min:45.5ms, avg: 48.5ms, p80:49.8ms, p95:50.8ms, iters:2, tasks:141, threads:141}                                                                                                                                                                                                                                                        | ExchangeType: PassThrough                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | N/A       | N/A  |
|               └─HashAgg_38           | 804.33      | 2       | mpp[tiflash] |               | tiflash_task:{proc max:72.9ms, min:44.5ms, avg: 48.5ms, p80:49.8ms, p95:50.8ms, iters:2, tasks:141, threads:141}                                                                                                                                                                                                                                                        | funcs:count(1)->Column#48                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | N/A       | N/A  |
|                 └─Selection_54       | 21.47       | 2       | mpp[tiflash] |               | tiflash_task:{proc max:72.9ms, min:44.5ms, avg: 48.5ms, p80:49.8ms, p95:50.8ms, iters:2, tasks:141, threads:141}                                                                                                                                                                                                                                                        | or(and(eq(tpcds50.item.i_manufact, tpcds50.item.i_manufact), or(or(and(and(eq(tpcds50.item.i_category, "Women"), or(eq(tpcds50.item.i_color, "dim"), eq(tpcds50.item.i_color, "green"))), and(or(eq(tpcds50.item.i_units, "Gross"), eq(tpcds50.item.i_units, "Dozen")), or(eq(tpcds50.item.i_size, "economy"), eq(tpcds50.item.i_size, "petite")))), and(and(eq(tpcds50.item.i_category, "Women"), or(eq(tpcds50.item.i_color, "navajo"), eq(tpcds50.item.i_color, "aquamarine"))), and(or(eq(tpcds50.item.i_unit... | N/A       | N/A  |
|                   └─TableFullScan_53 | 49868460.00 | 8742000 | mpp[tiflash] | table:item    | tiflash_task:{proc max:14.6ms, min:7.65ms, avg: 10.6ms, p80:11.7ms, p95:12.9ms, iters:141, tasks:141, threads:141}                                                                                                                                                                                                                                                      | keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | N/A       | N/A  |
+--------------------------------------+-------------+---------+--------------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
14 rows in set
Time: 7.785s

-- 1 tiflash instance is hanging
mysql tpcds50> source /data1/jaysonhuang/tpcds_41.sql;
+--------------------------------------+-------------+---------+--------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| id                                   | estRows     | actRows | task         | access object | execution info                                                                                                                                                                                                                                                                                                                                                                                      | operator info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | memory    | disk |
+--------------------------------------+-------------+---------+--------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
| Projection_16                        | 100.00      | 2       | root         |               | time:2m13.4s, loops:2, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                                              | tpcds50.item.i_product_name                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 1.49 KB   | N/A  |
| └─TopN_19                            | 100.00      | 2       | root         |               | time:2m13.4s, loops:2                                                                                                                                                                                                                                                                                                                                                                               | tpcds50.item.i_product_name, offset:0, count:100                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 2.50 KB   | N/A  |
|   └─HashAgg_24                       | 802.44      | 2       | root         |               | time:2m13.4s, loops:4, partial_worker:{wall_time:2m13.434801233s, concurrency:5, task_num:1, tot_wait:11m7.171380659s, tot_exec:476.314µs, tot_time:11m7.171871053s, max:2m13.434764347s, p95:2m13.434764347s}, final_worker:{wall_time:2m13.434820576s, concurrency:5, task_num:2, tot_wait:11m7.173929374s, tot_exec:69.09µs, tot_time:11m7.174002783s, max:2m13.434805011s, p95:2m13.434805011s} | group by:tpcds50.item.i_product_name, funcs:firstrow(tpcds50.item.i_product_name)->tpcds50.item.i_product_name, funcs:firstrow(tpcds50.item.i_product_name)->tpcds50.item.i_product_name                                                                                                                                                                                                                                                                                                                             | 30.5 KB   | N/A  |
|     └─Apply_26                       | 804.33      | 2       | root         |               | time:2m13.4s, loops:2, Concurrency:OFF, cache:ON, cacheHitRatio:82.174%                                                                                                                                                                                                                                                                                                                             | CARTESIAN inner join                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | 2.54 KB   | N/A  |
|       ├─TableReader_32(Build)        | 804.33      | 791     | root         |               | time:10.5ms, loops:3, cop_task: {num: 1, max: 10.6ms, proc_keys: 0, rpc_num: 1, rpc_time: 10.5ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                                                                                                                                                                                              | data:Selection_31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 43.6 KB   | N/A  |
|       │ └─Selection_31               | 804.33      | 791     | cop[tiflash] |               | tiflash_task:{time:8.21ms, loops:1, threads:1}                                                                                                                                                                                                                                                                                                                                                      | ge(tpcds50.item.i_manufact_id, 765), le(tpcds50.item.i_manufact_id, 805)                                                                                                                                                                                                                                                                                                                                                                                                                                             | N/A       | N/A  |
|       │   └─TableFullScan_30         | 62000.00    | 62000   | cop[tiflash] | table:i1      | tiflash_task:{time:7.21ms, loops:1, threads:1}                                                                                                                                                                                                                                                                                                                                                      | keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | N/A       | N/A  |
|       └─Selection_33(Probe)          | 643.46      | 2       | root         |               | time:7.2s, loops:143                                                                                                                                                                                                                                                                                                                                                                                | gt(Column#45, 0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 380 Bytes | N/A  |
|         └─HashAgg_55                 | 804.33      | 141     | root         |               | time:7.2s, loops:284, partial_worker:{wall_time:7.191126729s, concurrency:5, task_num:2, tot_wait:35.92010796s, tot_exec:11.244µs, tot_time:35.921152846s, max:55.082058ms, p95:53.362026ms}, final_worker:{wall_time:7.202088984s, concurrency:5, task_num:2, tot_wait:35.946815052s, tot_exec:1.677262ms, tot_time:35.949160529s, max:55.111185ms, p95:53.389755ms}                               | funcs:count(Column#48)->Column#45                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | 9.86 KB   | N/A  |
|           └─TableReader_57           | 804.33      | 2       | root         |               | time:7.18s, loops:143, cop_task: {num: 143, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}                                                                                                                                                                                                                                                                                         | data:ExchangeSender_56                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | N/A       | N/A  |
|             └─ExchangeSender_56      | 804.33      | 2       | mpp[tiflash] |               | tiflash_task:{proc max:53.4ms, min:46.2ms, avg: 49ms, p80:50.4ms, p95:51.6ms, iters:2, tasks:141, threads:141}                                                                                                                                                                                                                                                                                      | ExchangeType: PassThrough                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | N/A       | N/A  |
|               └─HashAgg_38           | 804.33      | 2       | mpp[tiflash] |               | tiflash_task:{proc max:53.4ms, min:46.2ms, avg: 49ms, p80:50.4ms, p95:51.6ms, iters:2, tasks:141, threads:141}                                                                                                                                                                                                                                                                                      | funcs:count(1)->Column#48                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | N/A       | N/A  |
|                 └─Selection_54       | 21.47       | 2       | mpp[tiflash] |               | tiflash_task:{proc max:53.4ms, min:46.2ms, avg: 49ms, p80:50.4ms, p95:51.6ms, iters:2, tasks:141, threads:141}                                                                                                                                                                                                                                                                                      | or(and(eq(tpcds50.item.i_manufact, tpcds50.item.i_manufact), or(or(and(and(eq(tpcds50.item.i_category, "Women"), or(eq(tpcds50.item.i_color, "dim"), eq(tpcds50.item.i_color, "green"))), and(or(eq(tpcds50.item.i_units, "Gross"), eq(tpcds50.item.i_units, "Dozen")), or(eq(tpcds50.item.i_size, "economy"), eq(tpcds50.item.i_size, "petite")))), and(and(eq(tpcds50.item.i_category, "Women"), or(eq(tpcds50.item.i_color, "navajo"), eq(tpcds50.item.i_color, "aquamarine"))), and(or(eq(tpcds50.item.i_unit... | N/A       | N/A  |
|                   └─TableFullScan_53 | 49868460.00 | 8742000 | mpp[tiflash] | table:item    | tiflash_task:{proc max:14.4ms, min:9.3ms, avg: 11.7ms, p80:12.5ms, p95:13.7ms, iters:141, tasks:141, threads:141}                                                                                                                                                                                                                                                                                   | keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | N/A       | N/A  |
+--------------------------------------+-------------+---------+--------------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------+
14 rows in set
Time: 133.644s

4. What is your TiDB version? (Required)

6.5.0

JaySon-Huang commented 1 year ago

When a tiflash instance is hanging (rather than process down), each time when tidb executes a MPP tasks, it takes about 2 sec to detect whether the tiflash instance is alive or not. And Apply_26 will cause serval MPP tasks and takes too much time in detecting tiflash alive. /cc @windtalker

JaySon-Huang commented 1 year ago

Workaround: set tidb_allow_mpp = 0 to bypass the tiflash instance alive detecting.

JaySon-Huang commented 1 year ago

/label sig/execution

ti-chi-bot commented 1 year ago

@JaySon-Huang: The label(s) sig/execution cannot be applied. These labels are supported: fuzz/sqlancer, challenge-program, compatibility-breaker, first-time-contributor, contribution, require-LGT3, good first issue, correctness, duplicate, proposal, security, needs-more-info, needs-cherry-pick-release-4.0, needs-cherry-pick-release-5.0, needs-cherry-pick-release-5.1, needs-cherry-pick-release-5.2, needs-cherry-pick-release-5.3, needs-cherry-pick-release-5.4, needs-cherry-pick-release-6.0, needs-cherry-pick-release-6.1, needs-cherry-pick-release-6.2, needs-cherry-pick-release-6.3, needs-cherry-pick-release-6.4, needs-cherry-pick-release-6.5, affects-4.0, affects-5.0, affects-5.1, affects-5.2, affects-5.3, affects-5.4, affects-6.0, affects-6.1, affects-6.2, affects-6.3, affects-6.4, affects-6.5, may-affects-4.0, may-affects-5.0, may-affects-5.1, may-affects-5.2, may-affects-5.3, may-affects-5.4, may-affects-6.0, may-affects-6.1, may-affects-6.2, may-affects-6.3, may-affects-6.4, may-affects-6.5.

In response to [this](https://github.com/pingcap/tidb/issues/39686#issuecomment-1339785292): >/label sig/execution Instructions for interacting with me using PR comments are available [here](https://prow.tidb.net/command-help). If you have questions or suggestions related to my behavior, please file an issue against the [ti-community-infra/tichi](https://github.com/ti-community-infra/tichi/issues/new?title=Prow%20issue:) repository.
windtalker commented 1 year ago

Required by customer and need to be fixed in 6.1.x

zanmato1984 commented 1 year ago

This requires a decent reimplementation of current aliveness mechanism. Will be done as an big improvement in the future.