Found that the PD leader meets error while sending data to the PD follower according to the region syncer stream, and disconnects the stream on its side. However, the PD follower didn't sense this disconnection and kept this stream alive. Due to this unilateral disconnection, the synchronization of region meta info was interrupted and never resumed.
The server-side error code is Unavailable desc = transport is closing from the code below.
Bug Report
Found that the PD leader meets error while sending data to the PD follower according to the region syncer stream, and disconnects the stream on its side. However, the PD follower didn't sense this disconnection and kept this stream alive. Due to this unilateral disconnection, the synchronization of region meta info was interrupted and never resumed.
The server-side error code is
Unavailable desc = transport is closing
from the code below.https://github.com/tikv/pd/blob/0c1246dd219fd16b4b2ff5108941e5d3e958922d/server/region_syncer/server.go#L258-L264
We must figure out why this disconnection won't trigger the client-side reconnection code below.
https://github.com/tikv/pd/blob/0c1246dd219fd16b4b2ff5108941e5d3e958922d/server/region_syncer/client.go#L170-L178
What version of PD are you using (
pd-server -V
)?v4.0.14