Open chyezh opened 1 month ago
/assign chyezh
@chyezh so we need to change to etcd timeout to shorter?
We need to retry on etcd fail to avoid etcd error and milvus panic. otherwise tuning the timeout to smaller could be dangerous. We can probably change to 3s with 3-5 retries
@chyezh so we need to change to etcd timeout to shorter?
We need to retry on etcd fail to avoid etcd error and milvus panic. otherwise tuning the timeout to smaller could be dangerous. We can probably change to 3s with 3-5 retries
Yes, current default request timeout of etcdKV is 10s, it's too long. I need to confirm the implementation of the etcd client, then determine the optimization.
Meanwhile, Milvus' KV interface has a bad implementation, it does not receive RPC context.
func (kv *etcdKV) WalkWithPrefix(prefix string, paginationSize int, fn func([]byte, []byte) error) error {
start := time.Now()
prefix = path.Join(kv.rootPath, prefix)
batch := int64(paginationSize)
opts := []clientv3.OpOption{
clientv3.WithSort(clientv3.SortByKey, clientv3.SortAscend),
clientv3.WithLimit(batch),
clientv3.WithRange(clientv3.GetPrefixRangeEnd(prefix)),
}
key := prefix
for {
ctx, cancel := context.WithTimeout(context.TODO(), kv.requestTimeout)
defer cancel()
resp, err := kv.getEtcdMeta(ctx, key, opts...)
Another case found.
It may be a bad case for etcd server 3.5.5
.
The write operation is sent to a follower etcd-1
at term 2, while the leader etcd-2
is lost.
The etcd-0
and etcd-1
finish the election on term 3 at 02:44:44
, etcd-0
became the new leader; so the write operation should be redirected to new leader as expected.
2024-09-19 02:44:44.347 {"level":"info","ts":"2024-09-19T02:44:44.347Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f is starting a new election at term 2"}
2024-09-19 02:44:44.347 {"level":"info","ts":"2024-09-19T02:44:44.347Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f became pre-candidate at term 2"}
2024-09-19 02:44:44.347 {"level":"info","ts":"2024-09-19T02:44:44.347Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f received MsgPreVoteResp from 3d1b452b9ef8ed7f at term 2"}
2024-09-19 02:44:44.347 {"level":"info","ts":"2024-09-19T02:44:44.347Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f [logterm: 2, index: 1524239] sent MsgPreVote request to 5390be8ecae377e2 at term 2"}
2024-09-19 02:44:44.347 {"level":"info","ts":"2024-09-19T02:44:44.347Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f [logterm: 2, index: 1524239] sent MsgPreVote request to d646976b154728f0 at term 2"}
2024-09-19 02:44:44.347 {"level":"info","ts":"2024-09-19T02:44:44.347Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 3d1b452b9ef8ed7f lost leader 5390be8ecae377e2 at term 2"}
2024-09-19 02:44:44.347 {"level":"info","ts":"2024-09-19T02:44:44.347Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d646976b154728f0 [logterm: 2, index: 1524239, vote: 5390be8ecae377e2] cast MsgPreVote for 3d1b452b9ef8ed7f [logterm: 2, index: 1524239] at term 2"}
2024-09-19 02:44:44.348 {"level":"info","ts":"2024-09-19T02:44:44.348Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f received MsgPreVoteResp from d646976b154728f0 at term 2"}
2024-09-19 02:44:44.348 {"level":"info","ts":"2024-09-19T02:44:44.348Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f has received 2 MsgPreVoteResp votes and 0 vote rejections"}
2024-09-19 02:44:44.348 {"level":"info","ts":"2024-09-19T02:44:44.348Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f became candidate at term 3"}
2024-09-19 02:44:44.348 {"level":"info","ts":"2024-09-19T02:44:44.348Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f received MsgVoteResp from 3d1b452b9ef8ed7f at term 3"}
2024-09-19 02:44:44.348 {"level":"info","ts":"2024-09-19T02:44:44.348Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f [logterm: 2, index: 1524239] sent MsgVote request to 5390be8ecae377e2 at term 3"}
2024-09-19 02:44:44.348 {"level":"info","ts":"2024-09-19T02:44:44.348Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f [logterm: 2, index: 1524239] sent MsgVote request to d646976b154728f0 at term 3"}
2024-09-19 02:44:44.349 {"level":"info","ts":"2024-09-19T02:44:44.349Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d646976b154728f0 [term: 2] received a MsgVote message with higher term from 3d1b452b9ef8ed7f [term: 3]"}
2024-09-19 02:44:44.349 {"level":"info","ts":"2024-09-19T02:44:44.349Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d646976b154728f0 became follower at term 3"}
2024-09-19 02:44:44.349 {"level":"info","ts":"2024-09-19T02:44:44.349Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d646976b154728f0 [logterm: 2, index: 1524239, vote: 0] cast MsgVote for 3d1b452b9ef8ed7f [logterm: 2, index: 1524239] at term 3"}
2024-09-19 02:44:44.349 {"level":"info","ts":"2024-09-19T02:44:44.349Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: d646976b154728f0 lost leader 5390be8ecae377e2 at term 3"}
2024-09-19 02:44:44.351 {"level":"info","ts":"2024-09-19T02:44:44.350Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f received MsgVoteResp from d646976b154728f0 at term 3"}
2024-09-19 02:44:44.351 {"level":"info","ts":"2024-09-19T02:44:44.351Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f has received 2 MsgVoteResp votes and 0 vote rejections"}
2024-09-19 02:44:44.351 {"level":"info","ts":"2024-09-19T02:44:44.351Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"3d1b452b9ef8ed7f became leader at term 3"}
2024-09-19 02:44:44.351 {"level":"info","ts":"2024-09-19T02:44:44.351Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 3d1b452b9ef8ed7f elected leader 3d1b452b9ef8ed7f at term 3"}
2024-09-19 02:44:44.351 {"level":"info","ts":"2024-09-19T02:44:44.351Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: d646976b154728f0 elected leader 3d1b452b9ef8ed7f at term 3"}
2024-09-19 02:44:50.780 {"level":"info","ts":"2024-09-19T02:44:50.780Z","caller":"etcdserver/raft.go:529","msg":"restarting local member","cluster-id":"d353a2e618515b93","local-member-id":"5390be8ecae377e2","commit-index":1524238}
2024-09-19 02:44:50.781 {"level":"info","ts":"2024-09-19T02:44:50.781Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"5390be8ecae377e2 switched to configuration voters=(4403189114337815935 6021522222292891618 15440194858660604144)"}
2024-09-19 02:44:50.781 {"level":"info","ts":"2024-09-19T02:44:50.781Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"5390be8ecae377e2 became follower at term 2"}
2024-09-19 02:44:50.781 {"level":"info","ts":"2024-09-19T02:44:50.781Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft 5390be8ecae377e2 [peers: [3d1b452b9ef8ed7f,5390be8ecae377e2,d646976b154728f0], term: 2, commit: 1524238, applied: 1520152, lastindex: 1524239, lastterm: 2]"}
2024-09-19 02:44:50.793 {"level":"info","ts":"2024-09-19T02:44:50.793Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"5390be8ecae377e2 [term: 2] received a MsgHeartbeat message with higher term from 3d1b452b9ef8ed7f [term: 3]"}
2024-09-19 02:44:50.793 {"level":"info","ts":"2024-09-19T02:44:50.793Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"5390be8ecae377e2 became follower at term 3"}
2024-09-19 02:44:50.793 {"level":"info","ts":"2024-09-19T02:44:50.793Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 5390be8ecae377e2 elected leader 3d1b452b9ef8ed7f at term 3"}
2024-09-19 02:44:52.729 {"level":"warn","ts":"2024-09-19T02:44:52.729Z","caller":"v3rpc/interceptor.go:197","msg":"request stats","start time":"2024-09-19T02:44:43.729Z","time spent":"9.000073827s","remote":"10.15.1.24:57410","response type":"/etcdserverpb.KV/Put","request count":1,"request size":44,"response count":0,"response size":0,"request content":"key:\"92461f86230a8d5/kv/gid/timestamp\" value_size:8 "}
At etcd 3.5.5, there's no timeout control or redirect at server-side when leader changing.
Summary here:
Seems that we need to seprate etcd timeout and etcdkv timeout(or operation timeout) by ourselves
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
/reopen
Is there an existing issue for this?
Environment
Current Behavior
The etcd client in milvus accesses the etcd node namd etcd-0. The etcd-0 node is unavailable due to network partitioning and is in a state of repeated election, cannot apply write operations. The request timeout of the etcd client is too long (9 seconds in logs), so the process have been locked on the requesting node.
Expected Behavior
shorter timeout for etcd operation and retry the operation to healthy etcd node.
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response