Assume that we have 3 PD now, urls = [pd-0, pd-1, pd-2], now pd-0 is disconnected, and pd-2 is leader.
So getMembers(pd-0) always timeout, and we have hold the leader_mutex, which means getMembers(pd-0) tasks more than pd_timeout seconds.
When there are other requests like getRegionByKey, it will task more than pd_timeout seconds when try to acquire the leader_mutex before sending the request, and it cause the request timeout too.
Issue: https://github.com/pingcap/tiflash/issues/9243
https://github.com/tikv/client-c/blob/2d791221c64dcfd3bf7c6ba4ce8656ed640a8901/src/pd/Client.cc#L184-L200
Assume that we have 3 PD now, urls = [pd-0, pd-1, pd-2], now pd-0 is disconnected, and pd-2 is leader.
So
getMembers(pd-0)
always timeout, and we have hold theleader_mutex
, which meansgetMembers(pd-0)
tasks more thanpd_timeout
seconds.When there are other requests like
getRegionByKey
, it will task more thanpd_timeout
seconds when try to acquire theleader_mutex
before sending the request, and it cause the request timeout too.https://github.com/tikv/client-c/blob/2d791221c64dcfd3bf7c6ba4ce8656ed640a8901/src/pd/Client.cc#L353-L375