pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.01k stars 5.82k forks source link

br backup failed when pd rolling restart #53712

Open Lily2025 opened 4 months ago

Lily2025 commented 4 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、run br backup 2、pd rolling restart br log: br.log.2024-05-30T09.45.36Z.tar.gz

2. What did you expect to see? (Required)

br backup can success

3. What did you see instead (Required)

br backup failed [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240528122050-634e05a87ee0/client.go:1602\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240528122050-634e05a87ee0/client.go:1193\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/workspace/source/tidb/br/pkg/conn/util/util.go:48\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/workspace/source/tidb/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry.func1\n\t/workspace/source/tidb/br/pkg/utils/retry.go:217\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetryV2[...]\n\t/workspace/source/tidb/br/pkg/utils/retry.go:235\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/workspace/source/tidb/br/pkg/utils/retry.go:216\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/workspace/source/tidb/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/workspace/source/tidb/br/pkg/backup/client.go:917\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/workspace/source/tidb/br/pkg/backup/client.go:876\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/workspace/source/tidb/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/root/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"] [unit-name="range start:7480000000000000f15f69800000000000000100 end:7480000000000000f15f698000000000000001fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240528122050-634e05a87ee0/client.go:1602\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240528122050-634e05a87ee0/client.go:1193\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/workspace/source/tidb/br/pkg/conn/util/util.go:48\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/workspace/source/tidb/br/pkg/conn/conn.go:83\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry.func1\n\t/workspace/source/tidb/br/pkg/utils/retry.go:217\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetryV2[...]\n\t/workspace/source/tidb/br/pkg/utils/retry.go:235\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/workspace/source/tidb/br/pkg/utils/retry.go:216\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/workspace/source/tidb/br/pkg/conn/conn.go:80\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/workspace/source/tidb/br/pkg/backup/client.go:917\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/workspace/source/tidb/br/pkg/backup/client.go:876\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/workspace/source/tidb/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/root/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"] [unit-name="range start:74800000000000010a5f69800000000000000100 end:74800000000000010a5f698000000000000001fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240528122050-634e05a87ee0/client.go:1602\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/root/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20240528122050-634e05a87ee0/client.go:1193

4. What is your TiDB version? (Required)

./tidb-server -V Release Version: v7.5.2 Edition: Community Git Commit Hash: 39ea2b30d32ab8cf486f30ca318cf2e4bd99eaef Git Branch: HEAD UTC Build Time: 2024-05-29 15:07:12 GoVersion: go1.21.10 Race Enabled: false Check Table Before Drop: false Store: unistore 2024-05-30T14:14:32.686+0800

Lily2025 commented 4 months ago

/severity major /assign 3pointer

3pointer commented 4 months ago

The error is can not find a valid target peer. which means br cannot find region from PD client. The only thing we can do is add more times retry. But I think we should find why PD client cannot get the region during rolling restart.