pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.29k stars 5.85k forks source link

add index failed or rollingback with error “Error 1105 (HY000): etcdserver: request timed out” when inject pdleader io delay 1s last for 2m #48204

Open Lily2025 opened 1 year ago

Lily2025 commented 1 year ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

tidb_enable_dist_task='on' 1、run workload 2、add index for one table 3、inject pdleader io delay 1s last for 2m

case config: tag: "ha-test-add-index" workloads:

2. What did you expect to see? (Required)

add index can success

3. What did you see instead (Required)

add index failed with error “Error 1105 (HY000): etcdserver: request timed out” when inject pdleader io delay 1s last for 2m

add index failed at 2023-11-01 22:59:39 (Error 1105 (HY000): etcdserver: request timed out) operatorLogs: [2023-11-01 22:55:42] ###### start adding index alter table sbtest1 add index index_test_1698850542668 (c) [2023-11-01 22:55:42] ###### wait for ddl job finish

4. What is your TiDB version? (Required)

git hash:5f7b6973b0d730b446d840733f213ad6637bee1f

Lily2025 commented 1 year ago

/assign ywqzzy

ywqzzy commented 1 year ago

https://github.com/tikv/pd/issues/7251

Lily2025 commented 10 months ago

for 7.6.0 relase test (testbed: endless-ha-test-add-index-tps-5941077-1-175):

[2024/01/14 09:25:46.239 +08:00] [INFO] [chaos.go:82] ["Run chaos success"] [2024/01/14 09:27:46.239 +08:00] [INFO] [chaos.go:94] ["Clean chaos"]

add index failed with error “Error 1105 (HY000): etcdserver: request timed out” when inject pdleader io delay 500ms last for 2m

add index failed at 2024-01-14 09:27:46 (Error 1105 (HY000): etcdserver: request timed out) operatorLogs: [2024-01-14 09:24:45] ###### start adding index alter table sbtest1 add index index_test_1705195485955 (c) [2024-01-14 09:24:45] ###### wait for ddl job finish

tidb logs: [2024/01/14 09:27:46.648 +08:00] [INFO] [ddl.go:1298] ["DDL job is failed"] [category=ddl] [jobID=545] [2024/01/14 09:27:46.648 +08:00] [INFO] [tidb.go:286] ["rollbackTxn called due to ddl/autocommit failure"] [2024/01/14 09:27:46.648 +08:00] [WARN] [session.go:2251] ["run statement failed"] [conn=3581941594] [session_alias=] [schemaVersion=706] [error="[0]etcdserver: request timed out"] [session="{\n \"currDBName\": \"sysbench_64_7000w\",\n \"id\": 3581941594,\n \"status\": 2,\n \"strictMode\": true,\n \"user\": {\n \"Username\": \"root\",\n \"Hostname\": \"10.233.115.8\",\n \"CurrentUser\": false,\n \"AuthUsername\": \"root\",\n \"AuthHostname\": \"%\",\n \"AuthPlugin\": \"mysql_native_password\"\n }\n}"] [2024/01/14 09:27:46.649 +08:00] [INFO] [conn.go:1155] ["command dispatched failed"] [conn=3581941594] [session_alias=] [connInfo="id:3581941594, addr:10.233.115.8:52454 status:10, collation:utf8mb4_general_ci, user:root"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="alter table sbtest1 add index index_test_1705195485955 (c)"] [txn_mode=PESSIMISTIC] [timestamp=447006765458391215] [err="[0]etcdserver: request timed out\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20231212100244-799fae176cfb/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20231212100244-799fae176cfb/juju_adaptor.go:15\ngithub.com/pingcap/tidb/pkg/ddl.(ddl).DoDDLJob\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl.go:1299\ngithub.com/pingcap/tidb/pkg/ddl.(ddl).createIndex\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl_api.go:7511\ngithub.com/pingcap/tidb/pkg/ddl.(ddl).AlterTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/ddl/ddl_api.go:3800\ngithub.com/pingcap/tidb/pkg/executor.(DDLExec).executeAlterTable\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/ddl.go:387\ngithub.com/pingcap/tidb/pkg/executor.(DDLExec).Next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/ddl.go:151\ngithub.com/pingcap/tidb/pkg/executor/internal/exec.Next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/internal/exec/executor.go:314\ngithub.com/pingcap/tidb/pkg/executor.(ExecStmt).next\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:1252\ngithub.com/pingcap/tidb/pkg/executor.(ExecStmt).handleNoDelayExecutor\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:997\ngithub.com/pingcap/tidb/pkg/executor.(ExecStmt).handleNoDelay\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:823\ngithub.com/pingcap/tidb/pkg/executor.(ExecStmt).Exec\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/adapter.go:598\ngithub.com/pingcap/tidb/pkg/session.runStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2380\ngithub.com/pingcap/tidb/pkg/session.(session).ExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/session/session.go:2239\ngithub.com/pingcap/tidb/pkg/server.(TiDBContext).ExecuteStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/driver_tidb.go:294\ngithub.com/pingcap/tidb/pkg/server.(clientConn).handleStmt\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:2024\ngithub.com/pingcap/tidb/pkg/server.(clientConn).handleQuery\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1775\ngithub.com/pingcap/tidb/pkg/server.(clientConn).dispatch\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1349\ngithub.com/pingcap/tidb/pkg/server.(clientConn).Run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/conn.go:1122\ngithub.com/pingcap/tidb/pkg/server.(Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/server/server.go:713\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"] [2024/01/14 09:27:46.649 +08:00] [WARN] [terror.go:249] ["Unknown error class"] [class=0]

tidb-0-2024-01-14T12-03-30.tar.gz tidb-1.tar.gz

cc @JmPotato @ywqzzy

Lily2025 commented 6 months ago

this issue may cause adding index rollback

[2024/05/01 03:09:40.467 +08:00] [INFO] [ddl.go:1291] ["DDL job is failed"] [category=ddl] [jobID=599] [2024/05/01 03:09:40.467 +08:00] [INFO] [tidb.go:269] ["rollbackTxn called due to ddl/autocommit failure"] [2024/05/01 03:09:40.467 +08:00] [WARN] [session.go:2150] ["run statement failed"] [conn=1107305062] [session_alias=] [schemaVersion=963] [error="[0]etcdserver: request timed out"] [session="{\n \"currDBName\": \"sysbench_64_7000w\",\n \"id\": 1107305062,\n \"status\": 2,\n \"strictMode\": true,\n \"user\": {\n \"Username\": \"root\",\n \"Hostname\": \"10.200.53.36\",\n \"CurrentUser\": false,\n \"AuthUsername\": \"root\",\n \"AuthHostname\": \"%\",\n \"AuthPlugin\": \"mysql_native_password\"\n }\n}"]