tikv / client-c

The C++ TiKV client used by TiFlash.
Apache License 2.0
40 stars 48 forks source link

fix requests to leader may fail due to one pd disconnected #187

Closed Lloyd-Pottiger closed 1 month ago

Lloyd-Pottiger commented 1 month ago

Issue: https://github.com/pingcap/tiflash/issues/9243

https://github.com/tikv/client-c/blob/2d791221c64dcfd3bf7c6ba4ce8656ed640a8901/src/pd/Client.cc#L184-L200

Assume that we have 3 PD now, urls = [pd-0, pd-1, pd-2], now pd-0 is disconnected, and pd-2 is leader.

So getMembers(pd-0) always timeout, and we have hold the leader_mutex, which means getMembers(pd-0) tasks more than pd_timeout seconds.

When there are other requests like getRegionByKey, it will task more than pd_timeout seconds when try to acquire the leader_mutex before sending the request, and it cause the request timeout too.

https://github.com/tikv/client-c/blob/2d791221c64dcfd3bf7c6ba4ce8656ed640a8901/src/pd/Client.cc#L353-L375

Lloyd-Pottiger commented 1 month ago

/cc @JaySon-Huang @gengliqi