tikv / client-go

Go client for TiKV
Apache License 2.0
282 stars 222 forks source link

Unnecessary periodic safepoint checker in TiKV+PD (without TiDB) deployments #1506

Open ArthurChiao opened 2 days ago

ArthurChiao commented 2 days ago

Hi,

Noticed that there will be a safepoint checker when initiating a tikv/pd client, https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L209

which will periodically check the /tidb/store/gcworker/saved_safe_point key in PD'd embedded etcd, https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L58 https://github.com/tikv/client-go/blob/master/tikv/safepoint.go#L209

However, in TiKV+PD deployments (without TiDB), such in the JuiceFS +TiKV/PD case, this check is unnecessary, because no components will ever set that key (use xxx/gc/safe_point for GC instead).

What's more, when one or multiple PD has problems, all clients will reconnect the PD cluster, resulting in massive concurrent requests to /tidb/store/gcworker/saved_safe_point, which may freeze the embedded etcd, with PD logs as below,

[WARN] [util.go:144] ["apply request took too long"] [took=4.96746s] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/tidb/store/gcworker/saved_safe_point\" "] [response="range_response_count:0 size:7"] []
...

A stability concern for big clusters (e.g. with thousands of clients).

Maybe make this checker optional when initiating a client is better? Thanks!