Closed binshi-bing closed 11 months ago
[etcdutil.go:136] ["kv gets too slow"] [request-key=/ms/7187976276065784319/tso/00000/dc-location] [cost=10.000692425s]
Maybe etcd client still uses the old connection.
@lhy1024 Any progress on this issue?
@lhy1024 Any progress on this issue?
I am trying to reproduce it.
This issue includes at least two parts, one of which is a watch issue for the secondary.
Regarding whether the delete event is missing, "current leadership is deleted" appears after "required revision has been compacted", indicating that the delete event is later than the previous watch 1800213. So the previous delete event did not miss.
However, we cannot exclude the watcher's handling of the compact revision. The pd log shows that the compact of 2353567 was executed at 2:26, the pd was restarted at 3:07, and the tso secondary received the compact from 1800213 to 2353567 only at 3:21. The compact case should have been received half an hour ago.
The other part is about connections.
I found a similar issue where the request failed for 15 minutes after the pod update and resumed thereafter, the comments show it is related to TCP_USER_TIMEOUT.
Checking tcp_retries2 may not disconnect effectively when the keepalive mechanism is not triggered, the documentation says that the timeout may be between 13 and 30 minutes, and the documentation also says that TCP_USER_TIMEOUT can be be used to configure.
According to https://github.com/grpc/proposal/blob/master/A18-tcp-user-timeout.md, after https://github.com/grpc/grpc-go/pull/2307, TCP_USER_TIMEOUT can be configured via KeepaliveParams to configure TCP_USER_TIMEOUT.
But unfortunately, my local attempts with iptable and tcpkill did not reproduce this timeout.
We can consider temporarily using multiendpoint + keepalive again to avoid this problem, but note that this will cause pd-leader-io-hang to always fail
In the meantime, we can also introduce withRequireLeader and the handling of closeErr, and revision = wresp.Header.Revision + 1, before investigating the watch problem in the secondary.
This issue includes at least two parts, one of which is a watch issue for the secondary.
Regarding whether the delete event is missing, "current leadership is deleted" appears after "required revision has been compacted", indicating that the delete event is later than the previous watch 1800213. So the previous delete event did not miss.
However, we cannot exclude the watcher's handling of the compact revision. The pd log shows that the compact of 2353567 was executed at 2:26, the pd was restarted at 3:07, and the tso secondary received the compact from 1800213 to 2353567 only at 3:21. The compact case should have been received half an hour ago.
Maybe there is no any problem with handling of the compact revision, I check the log again.
[2023/06/01 13:53:24.673 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379]
[2023/06/01 13:53:24.673 +00:00] [INFO] [leadership.go:122] ["check campaign resp"] [resp="{\"header\":{\"cluster_id\":10920548605668718190,\"member_id\":13485048042260555703,\"revision\":1800213,\"raft_term\":20},\"succeeded\":true,\"responses\":[{\"Response\":{\"ResponsePut\":{\"header\":{\"revision\":1800213}}}}]}"]
**{....none "update tso primary" in log}**
[2023/06/01 15:26:50.423 +00:00] [INFO] [kvstore_compaction.go:56] ["finished scheduled compaction"] [compact-revision=1825171] [took=1.166314444s]
**{...some log about compact}**
[2023/06/02 02:26:50.588 +00:00] [INFO] [kvstore_compaction.go:56] ["finished scheduled compaction"] [compact-revision=2353567] [took=1.179783116s]
[2023/06/02 03:06:35.932 +00:00] [INFO] [versioninfo.go:89] ["Welcome to Placement Driver (API SERVICE)"](pd-0)
[2023/06/02 03:06:36.507 +00:00] [INFO] [versioninfo.go:89] ["Welcome to Placement Driver (API SERVICE)"](pd-2)
[2023/06/02 03:06:42.576 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379](pd-0)
[2023/06/02 03:06:49.009 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-1.tso-service.tidb-serverless.svc:2379](pd-2)
[2023/06/02 03:21:53.564 +00:00] [WARN] [leadership.go:194] ["required revision has been compacted, use the compact revision"] [required-revision=1800213] [compact-revision=2353567]
[2023/06/02 03:21:53.808 +00:00] [INFO] [server.go:1786] ["update tso primary"] [primary=http://pd-tso-server-0.tso-service.tidb-serverless.svc:2379]
These logs show that at 13:53 tso last updated the leader to tso-1, some other key updates triggered multiple compacts, and at 03:06 pd restarted, which was still using tso-1 until the connection was re-elected later.
To prove this guess, I implemented a simple unit test that updates the other keys and then compacts them to see if the watcher receives the message.
// watcher watch "TestWatcherBreak"
// some other operator
suite.put("TestWatcherBreak", "3")
// put other key
for i := 0; i < 1000; i++ {
suite.put(fmt.Sprintf("TestWatcherBreak/%d", i), fmt.Sprintf("4_%d", i))
}
// the rivison is greater than 1000
resp, err := EtcdKVGet(suite.client, "TestWatcherBreak")
suite.NoError(err)
suite.Greater(int(resp.Header.Revision), 1000)
// compact and watcher doesn't any message
revision := resp.Header.Revision
resp2, err := suite.etcd.Server.Compact(suite.ctx, &etcdserverpb.CompactionRequest{Revision: revision})
suite.NoError(err)
suite.Equal(revision, resp2.Header.Revision)
// put key and watcher update its rivision
time.Sleep(time.Second * 10)
suite.put("TestWatcherBreak", "5")
This test proves that if it is just compact, the watcher does not receive information until it updates the key it is watching.
I also try to reproduct it in k8s
Enhancement Task
What did you do? In dev env,
What did you expect to see? TSO primary election within 15 seconds
What did you see instead? TSO primary election took 14 minutes
What version of PD are you using (pd-server -V)? tidbcloud/pd-cse release-6.6-keyspace 9e1e2de5194655c2e1224ce94057e7f7fb5f7eb4 tikv/pd master