However, in TiKV+PD deployments (without TiDB), such in the JuiceFS +TiKV/PD case, this check is unnecessary, because no components will ever set that key (use xxx/gc/safe_point for GC instead).
What's more, when one or multiple PD has problems, all clients will reconnect the PD cluster, resulting in massive concurrent requests to /tidb/store/gcworker/saved_safe_point, which may freeze the embedded etcd, with PD logs as below,
[WARN] [util.go:144] ["apply request took too long"] [took=4.96746s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/tidb/store/gcworker/saved_safe_point\" "] [response="range_response_count:0 size:7"] []
...
A stability concern for big clusters (e.g. with thousands of clients).
Maybe make this checker optional when initiating a client is better? Thanks!
Hi,
Noticed that there will be a safepoint checker when initiating a tikv/pd client, https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L209
which will periodically check the
/tidb/store/gcworker/saved_safe_point
key in PD'd embedded etcd, https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L58 https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L209However, in TiKV+PD deployments (without TiDB), such in the
JuiceFS +TiKV/PD
case, this check is unnecessary, because no components will ever set that key (usexxx/gc/safe_point
for GC instead).What's more, when one or multiple PD has problems, all clients will reconnect the PD cluster, resulting in massive concurrent requests to
/tidb/store/gcworker/saved_safe_point
, which may freeze the embedded etcd, with PD logs as below,A stability concern for big clusters (e.g. with thousands of clients).
Maybe make this checker optional when initiating a client is better? Thanks!