Closed wangting0128 closed 1 year ago
/unassign
based on the logs, we found that, many connect
operation in grpcclient
meets a context canceled
, which means the connect
operation timeout before finished. which may have two effect in search progress:
all available nodes are unreachable: no available node
to user levelfix: increase the value of grpc client dial timeout
based on the logs, we found that, many
connect
operation ingrpcclient
meets acontext canceled
, which means theconnect
operation timeout before finished. which may have two effect in search progress:
- the search request will failed, due to get grpc client failed. which will return to user level
- the background check qn health request failed, which will set qn to unreachable, make search request has no qn to execute, return a
all available nodes are unreachable: no available node
to user levelfix: increase the value of grpc client dial timeout
increase dial timeout may not help on failure cases? on the other side milvus should retry when connect timeout?
and we believe if only a few failure on chaos it's fine. we should more focus on recovery time
set the timeout to a reasonable value - for instance 10s and the system should be recovered in 30s should be the target we are looking for.
image: master-20230718-73512c72 server argo task: fouramf-9sc5q clients argo task: fouramf-concurrent-wwn8f
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lb-helm-ivfsq8-scene8-high-etcd-0 1/1 Running 0 23h 10.104.23.119 4am-node27 <none> <none>
lb-helm-ivfsq8-scene8-high-etcd-1 1/1 Running 0 23h 10.104.17.72 4am-node23 <none> <none>
lb-helm-ivfsq8-scene8-high-etcd-2 1/1 Running 0 23h 10.104.16.34 4am-node21 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-datacoord-86f76c7746-gf9bl 1/1 Running 0 23h 10.104.17.70 4am-node23 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-datanode-6f74bcfd6d-b56w5 1/1 Running 1 (23h ago) 23h 10.104.23.113 4am-node27 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-datanode-6f74bcfd6d-pdvlh 1/1 Running 1 (23h ago) 23h 10.104.6.189 4am-node13 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-indexcoord-6ccc6c95d-rvb9s 1/1 Running 0 23h 10.104.17.69 4am-node23 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-indexnode-554d486964-5x8cj 1/1 Running 0 23h 10.104.21.243 4am-node24 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-indexnode-554d486964-v8l4f 1/1 Running 0 23h 10.104.4.106 4am-node11 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-proxy-5f95dd46cb-9lg6d 1/1 Running 1 (23h ago) 23h 10.104.6.187 4am-node13 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-querycoord-7cc86df6b9-n25pq 1/1 Running 1 (23h ago) 23h 10.104.21.242 4am-node24 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-querynode-686894ccbf-dfc8s 1/1 Running 0 23h 10.104.16.28 4am-node21 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-querynode-686894ccbf-qxbfh 1/1 Running 0 23h 10.104.6.190 4am-node13 <none> <none>
lb-helm-ivfsq8-scene8-high-milvus-rootcoord-67499dcd65-fp2t2 1/1 Running 1 (23h ago) 23h 10.104.6.185 4am-node13 <none> <none>
lb-helm-ivfsq8-scene8-high-minio-0 1/1 Running 0 23h 10.104.23.118 4am-node27 <none> <none>
lb-helm-ivfsq8-scene8-high-minio-1 1/1 Running 0 23h 10.104.16.30 4am-node21 <none> <none>
lb-helm-ivfsq8-scene8-high-minio-2 1/1 Running 0 23h 10.104.17.73 4am-node23 <none> <none>
lb-helm-ivfsq8-scene8-high-minio-3 1/1 Running 0 23h 10.104.20.129 4am-node22 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-bookie-0 1/1 Running 0 23h 10.104.23.123 4am-node27 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-bookie-1 1/1 Running 0 23h 10.104.17.76 4am-node23 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-bookie-2 1/1 Running 0 23h 10.104.20.138 4am-node22 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-bookie-init-x8t7w 0/1 Completed 0 23h 10.104.17.66 4am-node23 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-broker-0 1/1 Running 0 23h 10.104.17.67 4am-node23 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-proxy-0 1/1 Running 0 23h 10.104.6.188 4am-node13 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-pulsar-init-q25k4 0/1 Completed 0 23h 10.104.6.186 4am-node13 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-recovery-0 1/1 Running 0 23h 10.104.23.114 4am-node27 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-zookeeper-0 1/1 Running 0 23h 10.104.23.120 4am-node27 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-zookeeper-1 1/1 Running 0 23h 10.104.17.79 4am-node23 <none> <none>
lb-helm-ivfsq8-scene8-high-pulsar-zookeeper-2 1/1 Running 0 23h 10.104.4.108 4am-node11 <none> <none>
proxy log:
clients log:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
image: master-20230822-9131a0aa argo task: fouramf-server-client-concurrent-r86xj
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lb-helm-multi-ivfsq8-etcd-0 1/1 Running 3 (2m41s ago) 7m3s 10.104.12.179 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-etcd-1 1/1 Running 3 (2m34s ago) 7m3s 10.104.18.75 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-etcd-2 1/1 Running 0 7m2s 10.104.13.101 4am-node16 <none> <none>
lb-helm-multi-ivfsq8-milvus-datacoord-9787dfd7d-55wdw 1/1 Running 1 (3m2s ago) 7m3s 10.104.12.171 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-milvus-datanode-7fc65867b8-cgm7t 1/1 Running 1 (3m1s ago) 7m3s 10.104.12.173 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-milvus-datanode-7fc65867b8-vzg6s 1/1 Running 1 (3m1s ago) 7m3s 10.104.18.61 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-milvus-indexcoord-7b9c6d5cd6-gghpf 1/1 Running 0 7m3s 10.104.18.58 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-milvus-indexnode-8649d6dd7d-rc4xh 1/1 Running 1 (3m2s ago) 7m3s 10.104.24.89 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-milvus-indexnode-8649d6dd7d-zcgr8 1/1 Running 0 7m3s 10.104.13.86 4am-node16 <none> <none>
lb-helm-multi-ivfsq8-milvus-proxy-6f55cd4cc4-jk8cq 1/1 Running 1 (3m2s ago) 7m3s 10.104.24.87 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-milvus-querycoord-bb89c9677-bfv59 1/1 Running 1 (3m2s ago) 7m3s 10.104.24.88 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-milvus-querynode-5cf5444bd7-mk64m 1/1 Running 1 (3m2s ago) 7m2s 10.104.24.90 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-milvus-querynode-5cf5444bd7-pmq6b 1/1 Running 1 (3m ago) 7m2s 10.104.12.174 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-milvus-rootcoord-7b48c986c9-5gw4g 1/1 Running 1 (3m2s ago) 7m3s 10.104.24.86 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-minio-0 1/1 Running 0 7m3s 10.104.13.100 4am-node16 <none> <none>
lb-helm-multi-ivfsq8-minio-1 1/1 Running 0 7m2s 10.104.24.92 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-minio-2 1/1 Running 0 7m2s 10.104.18.85 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-minio-3 1/1 Running 0 7m2s 10.104.5.68 4am-node12 <none> <none>
lb-helm-multi-ivfsq8-pulsar-bookie-0 1/1 Running 0 7m3s 10.104.12.181 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-bookie-1 1/1 Running 0 7m2s 10.104.13.104 4am-node16 <none> <none>
lb-helm-multi-ivfsq8-pulsar-bookie-2 1/1 Running 0 7m2s 10.104.18.84 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-pulsar-bookie-init-p2cdn 0/1 Completed 0 7m3s 10.104.12.169 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-broker-0 1/1 Running 0 7m3s 10.104.18.59 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-pulsar-proxy-0 1/1 Running 0 7m3s 10.104.12.172 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-pulsar-init-x4dw6 0/1 Completed 0 7m3s 10.104.12.170 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-recovery-0 1/1 Running 0 7m3s 10.104.18.60 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-0 1/1 Running 0 7m3s 10.104.12.180 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-1 1/1 Running 0 6m13s 10.104.18.96 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-2 1/1 Running 0 3m23s 10.104.4.199 4am-node11 <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
lb-helm-multi-ivfsq8-etcd-0 1/1 Running 3 (3h16m ago) 3h20m 10.104.12.179 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-etcd-1 1/1 Running 3 (3h16m ago) 3h20m 10.104.18.75 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-etcd-2 1/1 Running 1 (31m ago) 3h20m 10.104.13.101 4am-node16 <none> <none>
lb-helm-multi-ivfsq8-milvus-datacoord-9787dfd7d-55wdw 1/1 Running 1 (3h16m ago) 3h20m 10.104.12.171 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-milvus-datanode-7fc65867b8-cgm7t 1/1 Running 1 (3h16m ago) 3h20m 10.104.12.173 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-milvus-datanode-7fc65867b8-vzg6s 1/1 Running 1 (3h16m ago) 3h20m 10.104.18.61 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-milvus-indexcoord-7b9c6d5cd6-gghpf 1/1 Running 0 3h20m 10.104.18.58 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-milvus-indexnode-8649d6dd7d-rc4xh 1/1 Running 1 (3h16m ago) 3h20m 10.104.24.89 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-milvus-indexnode-8649d6dd7d-zcgr8 1/1 Running 0 3h20m 10.104.13.86 4am-node16 <none> <none>
lb-helm-multi-ivfsq8-milvus-proxy-6f55cd4cc4-jk8cq 1/1 Running 1 (3h16m ago) 3h20m 10.104.24.87 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-milvus-querycoord-bb89c9677-bfv59 1/1 Running 1 (3h16m ago) 3h20m 10.104.24.88 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-milvus-querynode-5cf5444bd7-mk64m 1/1 Running 1 (3h16m ago) 3h20m 10.104.24.90 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-milvus-querynode-5cf5444bd7-pmq6b 1/1 Running 1 (3h16m ago) 3h20m 10.104.12.174 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-milvus-rootcoord-7b48c986c9-5gw4g 1/1 Running 1 (3h16m ago) 3h20m 10.104.24.86 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-minio-0 1/1 Running 0 3h20m 10.104.13.100 4am-node16 <none> <none>
lb-helm-multi-ivfsq8-minio-1 1/1 Running 0 3h20m 10.104.24.92 4am-node29 <none> <none>
lb-helm-multi-ivfsq8-minio-2 1/1 Running 0 3h20m 10.104.18.85 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-minio-3 1/1 Running 0 3h20m 10.104.5.68 4am-node12 <none> <none>
lb-helm-multi-ivfsq8-pulsar-bookie-0 1/1 Running 0 3h20m 10.104.12.181 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-bookie-1 1/1 Running 0 3h20m 10.104.13.104 4am-node16 <none> <none>
lb-helm-multi-ivfsq8-pulsar-bookie-2 1/1 Running 0 3h20m 10.104.18.84 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-pulsar-bookie-init-p2cdn 0/1 Completed 0 3h20m 10.104.12.169 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-broker-0 1/1 Running 0 3h20m 10.104.18.59 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-pulsar-proxy-0 1/1 Running 0 3h20m 10.104.12.172 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-pulsar-init-x4dw6 0/1 Completed 0 3h20m 10.104.12.170 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-recovery-0 1/1 Running 0 3h20m 10.104.18.60 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-0 1/1 Running 0 3h20m 10.104.12.180 4am-node17 <none> <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-1 1/1 Running 0 3h19m 10.104.18.96 4am-node25 <none> <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-2 1/1 Running 0 3h17m 10.104.4.199 4am-node11 <none> <none>
clients log: clients.log
for now, if test nq=1000, concurrent=50, timeout=60s, it's easily timeout, cause the workload is really heavy. we recommand to increase the cpu resource or timeout param.
more details:
the avg queue latency
reach 15s
, which means the querynode can't handle the workload, there always has some task waiting in queue.
and the search latency
has reach 60s in sometimes, which means some requests will got a timeout error
for now, if test nq=1000, concurrent=50, timeout=60s, it's easily timeout, cause the workload is really heavy. we recommand to increase the cpu resource or timeout param.
Increase the search timeout from 60s to 120s image: master-20230823-148446cf argo task: fouramf-server-client-concurrent-ivfsq8
clients:
server:
Is there an existing issue for this?
Environment
Current Behavior
server argo task: fouramf-wgc6v concurrent client argo task:fouramf-concurrent-6x44k
server:
client test result: search fail counts
search max RT
search avg RT
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
fouramf-server-lb-2qn-2dn-large:
fouramf-client-sift-ivfsq8-replica2-shard2-nq1-search-high:
fouramf-client-sift-ivfsq8-replica1-shard2-nq1-search-high: