pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.28k stars 5.84k forks source link

br restore failed when injection tikv network delay #57054

Open Lily2025 opened 2 weeks ago

Lily2025 commented 2 weeks ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、run br restore 2、injection tikv network delay 50ms

2. What did you expect to see? (Required)

br restore can succeed

3. What did you see instead (Required)

br restore failed br.log.2024-10-28T03.50.17Z.zip

[32mfatal error: too many concurrent timer firings  runtime stack: runtime.throw({0x6d50fce?, 0xc000a841c0?}) /usr/local/go/src/runtime/panic.go:1067 +0x48 fp=0x7fa572ffcae8 sp=0x7fa572ffcab8 pc=0x22e3788 runtime.(*timer).unlockAndRun(0xc00198b8d0, 0x0?) /usr/local/go/src/runtime/time.go:1076 +0x28e fp=0x7fa572ffcb50 sp=0x7fa572ffcae8 pc=0x22cb26e runtime.(*timers).run(0xc00009b190, 0x8180b97e8b7382) /usr/local/go/src/runtime/time.go:1008 +0xf0 fp=0x7fa572ffcb78 sp=0x7fa572ffcb50 pc=0x22caf90 runtime.(*timers).check(0xc00009b190, 0x100000004?) /usr/local/go/src/runtime/time.go:942 +0x13d fp=0x7fa572ffcbc0 sp=0x7fa572ffcb78 pc=0x22cadbd runtime.findRunnable() /usr/local/go/src/runtime/proc.go:3270 +0x6f fp=0x7fa572ffcd38 sp=0x7fa572ffcbc0 pc=0x22b164f runtime.schedule() /usr/local/go/src/runtime/proc.go:3995 +0xb1 fp=0x7fa572ffcd70 sp=0x7fa572ffcd38 pc=0x22b3411 runtime.park_m(0xc0020856c0) /usr/local/go/src/runtime/proc.go:4102 +0x1eb fp=0x7fa572ffcdc8 sp=0x7fa572ffcd70 pc=0x22b382b runtime.mcall() /usr/local/go/src/runtime/asm_amd64.s:459 +0x4e fp=0x7fa572ffcde0 sp=0x7fa572ffcdc8 pc=0x22ea7ee goroutine 1 gp=0xc0000061c0 m=nil [semacquire, 15 minutes]: runtime.gopark(0x2274b5e?, 0x7fa59067a8e8?, 0x60?, 0xc6?, 0x22de100?) /usr/local/go/src/runtime/proc.go:424 +0xce fp=0xc007c60488 sp=0xc007c60468 pc=0x22e38ae runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:430 runtime.semacquire1(0xc01048de50, 0x0, 0x1, 0x0, 0x12) /usr/local/go/src/runtime/sema.go:178 +0x225 fp=0xc007c604f0 sp=0xc007c60488 pc=0x22bf5a5 sync.runtime_Semacquire(0x67aab80?) /usr/local/go/src/runtime/sema.go:71 +0x25 fp=0xc007c60528 sp=0xc007c604f0 pc=0x22e5265 sync.(*WaitGroup).Wait(0xc006c4d920?) /usr/local/go/src/sync/waitgroup.go:118 +0x48 fp=0xc007c60550 sp=0xc007c60528 pc=0x22ffc68 golang.org/x/sync/errgroup.(*Group).Wait(0xc01048de40) /root/go/pkg/mod/golang.org/x/sync@v0.8.0/errgroup/errgroup.go:56 +0x25 fp=0xc007c60570 sp=0xc007c60550 pc=0x3458165 github.com/pingcap/tidb/br/pkg/restore/snap_client.(*SnapClient).restoreSSTFilesInternal(0xc00216aea0, {0x75faa10, 0xc0021ee0f0}, {0xc006db0000, 0xb62, 0xc010456a80?}, {0x75fa888, 0xc006d12340}) /workspace/source/tidb/br/pkg/restore/snap_client/tikv_sender.go:451 +0x2d8 fp=0xc007c60680 sp=0xc007c60570 pc=0x58bf378 github.com/pingcap/tidb/br/pkg/restore/snap_client.(*SnapClient).RestoreSSTFiles(0xc00216aea0, {0x75faa10, 0xc0021ee0f0}, {0xc006db0000, 0xb62, 0x1ed9}, {0x75fa888, 0xc006d12340}) /workspace/source/tidb/br/pkg/restore/snap_client/tikv_sender.go:395 +0xa8 fp=0xc007c606e0 sp=0xc007c60680 pc=0x58bf028 github.com/pingcap/tidb/br/pkg/restore/snap_client.(*SnapClient).RestoreTables(0xc00216aea0, {0x75faa10, 0xc0021ee0f0}, {0x75ce718, 0xb184560}, {0xc005fad200, 0x20, 0x20}, {0xc0046fc000, 0x1ed9, ...}, ...) /workspace/source/tidb/br/pkg/restore/snap_client/tikv_sender.go:316 +0x5f5 fp=0xc007c60870 sp=0xc007c606e0 pc=0x58be6d5 github.com/pingcap/tidb/br/pkg/task.runSnapshotRestore({0x75faa10, 0xc001485b80}, 0xc001bc7600, {0x761b580, 0xc0018c6620}, {0x6ce70f5, 0x10}, 0xc001cea588, 0x0) /workspace/source/tidb/br/pkg/task/restore.go:1095 +0x2d85 fp=0xc007c610b0 sp=0xc007c60870 pc=0x59c5425 github.com/pingcap/tidb/br/pkg/task.RunRestore({0x75faa10, 0xc001485b80}, {0x761b580, 0xc0018c6620}, {0x6ce70f5, 0x10}, 0xc001cea588) /workspace/source/tidb/br/pkg/task/restore.go:704 +0x465 fp=0xc007c619b8 sp=0xc007c610b0 pc=0x59c1a05 main.runRestoreCommand(0xc0019cdb08, {0x6ce70f5, 0x10}) /workspace/source/tidb/br/cmd/br/restore.go:75 +0x74a fp=0xc007c61b90 sp=0xc007c619b8 pc=0x5d9a50a main.newDBRestoreCommand.func1(0xc0019cdb08?, {0xc0019c4800?, 0x4?, 0x6cbb44b?}) /workspace/source/tidb/br/cmd/br/restore.go:195 +0x1f fp=0xc007c61bb8 sp=0xc007c61b90 pc=0x5d9cb3f github.com/spf13/cobra.(*Command).execute(0xc0019cdb08, {0xc0000740e0, 0x8, 0x8}) /root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:985 +0xaaa fp=0xc007c61d40 sp=0xc007c61bb8 pc=0x587c94a github.com/spf13/cobra.(*Command).ExecuteC(0xc001534608) /root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1117 +0x3ff fp=0xc007c61e18 sp=0xc007c61d40 pc=0x587d27f github.com/spf13/cobra.(*Command).Execute(...) /root/go/pkg/mod/github.com/spf13/cobra@v1.8.1/command.go:1041 main.main()

4. What is your TiDB version? (Required)

./br -V Release Version: v8.4.0-alpha-475-g7292117502 Git Commit Hash: 729211750239904fbfdd78eba99cc3e8f754d5c6 Git Branch: HEAD Go Version: go1.23.2 UTC Build Time: 2024-10-27 06:11:32 Race Enabled: false

Lily2025 commented 2 weeks ago

/assign Leavrth

Lily2025 commented 2 weeks ago

/severity major

Leavrth commented 2 weeks ago

This issue is introduced into Go 1.23.2, and it will be fixed in Go 1.23.3.