tikv / pd

Placement driver for TiKV
Apache License 2.0
1.05k stars 720 forks source link

PD follower not attempting to reconnect after region syncer disconnection #5221

Open JmPotato opened 2 years ago

JmPotato commented 2 years ago

Bug Report

Found that the PD leader meets error while sending data to the PD follower according to the region syncer stream, and disconnects the stream on its side. However, the PD follower didn't sense this disconnection and kept this stream alive. Due to this unilateral disconnection, the synchronization of region meta info was interrupted and never resumed.

The server-side error code is Unavailable desc = transport is closing from the code below.

https://github.com/tikv/pd/blob/0c1246dd219fd16b4b2ff5108941e5d3e958922d/server/region_syncer/server.go#L258-L264

We must figure out why this disconnection won't trigger the client-side reconnection code below.

https://github.com/tikv/pd/blob/0c1246dd219fd16b4b2ff5108941e5d3e958922d/server/region_syncer/client.go#L170-L178

What version of PD are you using (pd-server -V)?

v4.0.14

mayjiang0203 commented 2 years ago

/remove-severity major /severity moderate