Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1、tidb_enable_dist_task='on'
2、run sysbench
3、add index for one table
4、inject pd leader network latency 50ms last for 3mins
2. What did you expect to see? (Required)
add index can success
3. What did you see instead (Required)
add index failed with error "Error 1105 (HY000): get TSO failed, tso client is nil"
add index failed at 2024-08-06 11:12:47: Error 1105 (HY000): get TSO failed, tso client is nil
operatorLogs:
[2024-08-06 11:12:34] ###### start adding index
ALTER TABLE sbtest1 ADD INDEX index_test_1722913954314(c)
[2024-08-06 11:12:34] ###### wait for ddl job finish
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
1、tidb_enable_dist_task='on' 2、run sysbench 3、add index for one table 4、inject pd leader network latency 50ms last for 3mins
2. What did you expect to see? (Required)
add index can success
3. What did you see instead (Required)
add index failed with error "Error 1105 (HY000): get TSO failed, tso client is nil"
add index failed at 2024-08-06 11:12:47: Error 1105 (HY000): get TSO failed, tso client is nil operatorLogs: [2024-08-06 11:12:34] ###### start adding index ALTER TABLE
sbtest1
ADD INDEXindex_test_1722913954314
(c
) [2024-08-06 11:12:34] ###### wait for ddl job finishtidb logs:
[2024/08/06 11:12:45.630 +08:00] [WARN] [pd_service_discovery.go:834] ["[pd] failed to get cluster id"] [url=http://tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.233.115.45:2379: i/o timeout\" target:tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.233.115.45:2379: i/o timeout\" target:tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379 status:TRANSIENT_FAILURE"] [2024/08/06 11:12:45.656 +08:00] [INFO] [pd_service_discovery.go:910] ["[pd] cannot update member from this url"] [url=http://tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.233.115.45:2379: i/o timeout\" target:tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.233.115.45:2379: i/o timeout\" target:tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379 status:TRANSIENT_FAILURE"] [2024/08/06 11:12:45.658 +08:00] [INFO] [pd_service_discovery.go:1016] ["[pd] switch leader"] [new-leader=http://tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379] [old-leader=] [2024/08/06 11:12:45.658 +08:00] [INFO] [pd_service_discovery.go:498] ["[pd] init cluster id"] [cluster-id=7399761271018479670] [2024/08/06 11:12:45.658 +08:00] [WARN] [pd_service_discovery.go:509] ["[pd] failed to check service mode and will check later"] [error="[PD:client:ErrClientGetClusterInfo]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.233.115.45:2379: i/o timeout\" target:tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.233.115.45:2379: i/o timeout\" target:tc-pd-0.tc-pd-peer.endless-ha-test-add-index-tps-7614410-1-895.svc:2379 status:TRANSIENT_FAILURE"] [2024/08/06 11:12:45.711 +08:00] [WARN] [resource_manager_client.go:302] ["[resource_manager] get token stream error"] [error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.233.115.45:2379: i/o timeout\""] [2024/08/06 11:12:45.711 +08:00] [INFO] [resource_manager_client.go:290] ["[resource manager] exit resource token dispatcher"] [2024/08/06 11:12:45.711 +08:00] [INFO] [pd_service_discovery.go:550] ["[pd] exit member loop due to context canceled"] [2024/08/06 11:12:45.711 +08:00] [INFO] [pd_service_discovery.go:637] ["[pd] close pd service discovery client"] [2024/08/06 11:12:45.712 +08:00] [ERROR] [backend_mgr.go:152] ["build ingest backend failed"] ["job ID"=543] [error="[Lightning:KV:ErrCreateKVClient]create kv client error: [PD:client:ErrClientGetTSO]get TSO failed, tso client is nil"] [2024/08/06 11:12:45.712 +08:00] [ERROR] [task_executor.go:549] [onError] [task-id=300051] [task-type=backfill] [error="[Lightning:KV:ErrCreateKVClient]create kv client error: [PD:client:ErrClientGetTSO]get TSO failed, tso client is nil"] [stack="github.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(BaseTaskExecutor).onError\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:549\ngithub.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(BaseTaskExecutor).runStep\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:314\ngithub.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(BaseTaskExecutor).RunStep\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:266\ngithub.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(BaseTaskExecutor).Run\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/task_executor.go:246\ngithub.com/pingcap/tidb/pkg/disttask/framework/taskexecutor.(Manager).startTaskExecutor.func1\n\t/workspace/source/tidb/pkg/disttask/framework/taskexecutor/manager.go:337\ngithub.com/pingcap/tidb/pkg/util.(WaitGroupWrapper).RunWithLog.func1\n\t/workspace/source/tidb/pkg/util/wait_group_wrapper.go:171"] [2024/08/06 11:12:45.712 +08:00] [ERROR] [task_executor.go:555] ["taskExecutor met first error"] [task-id=300051] [task-type=backfill] [error="[Lightning:KV:ErrCreateKVClient]create kv client error: [PD:client:ErrClientGetTSO]get TSO failed, tso client is nil"] [2024/08/06 11:12:45.712 +08:00] [INFO] [task_executor.go:309] ["execute task step failed"] [task-id=300051] [task-type=backfill] [step=read-index] [mem-limit-percent=0.7] [server-mem-limit=80%] [resource="[CPU=4, Mem=8GiB]"] [takeTime=1.182030051s] [error="[Lightning:KV:ErrCreateKVClient]create kv client error: [PD:client:ErrClientGetTSO]get TSO failed, tso client is nil"] [2024/08/06 11:12:45.712 +08:00] [WARN] [terror.go:242] ["Unknown error class"] [class=PD] [2024/08/06 11:12:45.744 +08:00] [INFO] [task_executor.go:657] ["failed one subtask succeed"] [task-id=300051] [task-type=backfill] [subtask-err="[PD:client:ErrClientGetTSO]get TSO failed, tso client is nil"]
4. What is your TiDB version? (Required)
./tidb-server -V Release Version: v8.1.1 Edition: Community Git Commit Hash: 891151b41bda2039bbe7483c57e134834c9226a6 Git Branch: HEAD UTC Build Time: 2024-08-05 14:17:53 GoVersion: go1.21.10 Race Enabled: false Check Table Before Drop: false Store: unistore 2024-08-06T11:08:35.774+0800