milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.43k stars 2.82k forks source link

[Bug]: [benchmark][multi-replicas-loadbalance] In the scenario of multiple clients, high concurrency, and large nq, search raises error #25558

Closed wangting0128 closed 1 year ago

wangting0128 commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:master-20230711-cb721781
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.4.0.dev73
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server argo task: fouramf-wgc6v concurrent client argo task:fouramf-concurrent-6x44k

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lb-helm-multi-collections-etcd-0                                  1/1     Running     0               42h     10.104.21.108   4am-node24   <none>           <none>
lb-helm-multi-collections-etcd-1                                  1/1     Running     0               42h     10.104.23.13    4am-node27   <none>           <none>
lb-helm-multi-collections-etcd-2                                  1/1     Running     0               42h     10.104.19.200   4am-node28   <none>           <none>
lb-helm-multi-collections-milvus-datacoord-877d575-q67vg          1/1     Running     0               42h     10.104.5.13     4am-node12   <none>           <none>
lb-helm-multi-collections-milvus-datanode-6cbf5d67d7-dpntp        1/1     Running     1 (42h ago)     42h     10.104.1.34     4am-node10   <none>           <none>
lb-helm-multi-collections-milvus-datanode-6cbf5d67d7-kpcqk        1/1     Running     1 (42h ago)     42h     10.104.19.198   4am-node28   <none>           <none>
lb-helm-multi-collections-milvus-indexcoord-79579c8555-4j4dr      1/1     Running     0               42h     10.104.5.10     4am-node12   <none>           <none>
lb-helm-multi-collections-milvus-indexnode-86875d8c44-6j8tx       1/1     Running     0               42h     10.104.9.161    4am-node14   <none>           <none>
lb-helm-multi-collections-milvus-indexnode-86875d8c44-zcg5q       1/1     Running     0               42h     10.104.5.12     4am-node12   <none>           <none>
lb-helm-multi-collections-milvus-proxy-6764cc5875-74s4h           1/1     Running     1 (42h ago)     42h     10.104.5.11     4am-node12   <none>           <none>
lb-helm-multi-collections-milvus-querycoord-75d668fb-hn2n4        1/1     Running     1 (42h ago)     42h     10.104.19.197   4am-node28   <none>           <none>
lb-helm-multi-collections-milvus-querynode-97cc79578-fxvtd        1/1     Running     0               42h     10.104.17.203   4am-node23   <none>           <none>
lb-helm-multi-collections-milvus-querynode-97cc79578-zpbtg        1/1     Running     0               42h     10.104.5.16     4am-node12   <none>           <none>
lb-helm-multi-collections-milvus-rootcoord-85fd96d8b-xd87f        1/1     Running     1 (42h ago)     42h     10.104.21.102   4am-node24   <none>           <none>
lb-helm-multi-collections-minio-0                                 1/1     Running     0               42h     10.104.1.39     4am-node10   <none>           <none>
lb-helm-multi-collections-minio-1                                 1/1     Running     0               42h     10.104.23.11    4am-node27   <none>           <none>
lb-helm-multi-collections-minio-2                                 1/1     Running     0               42h     10.104.21.105   4am-node24   <none>           <none>
lb-helm-multi-collections-minio-3                                 1/1     Running     0               42h     10.104.18.88    4am-node25   <none>           <none>
lb-helm-multi-collections-pulsar-bookie-0                         1/1     Running     0               42h     10.104.1.40     4am-node10   <none>           <none>
lb-helm-multi-collections-pulsar-bookie-1                         1/1     Running     0               42h     10.104.21.109   4am-node24   <none>           <none>
lb-helm-multi-collections-pulsar-bookie-2                         1/1     Running     0               42h     10.104.23.16    4am-node27   <none>           <none>
lb-helm-multi-collections-pulsar-bookie-init-7mp2s                0/1     Completed   0               42h     10.104.19.196   4am-node28   <none>           <none>
lb-helm-multi-collections-pulsar-broker-0                         1/1     Running     0               42h     10.104.5.15     4am-node12   <none>           <none>
lb-helm-multi-collections-pulsar-proxy-0                          1/1     Running     0               42h     10.104.5.14     4am-node12   <none>           <none>
lb-helm-multi-collections-pulsar-pulsar-init-zmld4                0/1     Completed   0               42h     10.104.19.195   4am-node28   <none>           <none>
lb-helm-multi-collections-pulsar-recovery-0                       1/1     Running     0               42h     10.104.23.7     4am-node27   <none>           <none>
lb-helm-multi-collections-pulsar-zookeeper-0                      1/1     Running     0               42h     10.104.23.10    4am-node27   <none>           <none>
lb-helm-multi-collections-pulsar-zookeeper-1                      1/1     Running     0               42h     10.104.19.202   4am-node28   <none>           <none>
lb-helm-multi-collections-pulsar-zookeeper-2                      1/1     Running     0               42h     10.104.21.121   4am-node24   <none>           <none>
截屏2023-07-13 15 06 41 截屏2023-07-13 15 07 20 截屏2023-07-13 15 07 35 截屏2023-07-13 15 07 58 截屏2023-07-13 15 08 14 截屏2023-07-13 15 08 36 截屏2023-07-13 15 21 13

client test result: search fail counts

截屏2023-07-13 15 03 56

search max RT

截屏2023-07-13 15 05 59

search avg RT

截屏2023-07-13 15 06 20 截屏2023-07-13 15 21 34

Expected Behavior

No response

Steps To Reproduce

1、deploy cluster Milvus with 2 queryNodes
2、concurrent 10 client which have 2 types: replica=1 and replica=2; each type has 5 clients
   a. create a collection with shard_num=2
   b. insert 10m data, build IVF_SQ8 index
   c. load with replica=1 or 2
   d. concurrent search by locust: nq=100, topk=10, concurrent_number=100, timeout=60  <- search raise error

Milvus Log

No response

Anything else?

fouramf-server-lb-2qn-2dn-large:

queryNode:
  resources:
    limits:
      cpu: '50.0'
      memory: 100Gi
    requests:
      cpu: '25.0'
      memory: 50Gi
  replicas: 2
indexNode:
  resources:
    limits:
      cpu: '8.0'
      memory: 8Gi
    requests:
      cpu: '5.0'
      memory: 5Gi
  replicas: 2
dataNode:
  resources:
    limits:
      cpu: '2.0'
      memory: 16Gi
    requests:
      cpu: '2.0'
      memory: 2Gi
  replicas: 2

fouramf-client-sift-ivfsq8-replica2-shard2-nq1-search-high:

    load_params:
      replica_number: 2
    collection_params:
      shards_num: 2
    dataset_params:
      dim: 128
      dataset_name: sift
      dataset_size: 10m
      ni_per: 50000
      metric_type: L2
    index_params:
      index_type: IVF_SQ8
      index_param:
        nlist: 2048
    concurrent_params:
      concurrent_number: 100
      during_time: 2h
      interval: 20
    concurrent_tasks:
      - type: search
        weight: 1
        params:
          nq: 100
          top_k: 10
          search_param:
            nprobe: 64
          timeout: 60
          random_data: true

fouramf-client-sift-ivfsq8-replica1-shard2-nq1-search-high:

    load_params:
      replica_number: 1
    collection_params:
      shards_num: 2
    dataset_params:
      dim: 128
      dataset_name: sift
      dataset_size: 10m
      ni_per: 50000
      metric_type: L2
    index_params:
      index_type: IVF_SQ8
      index_param:
        nlist: 2048
    concurrent_params:
      concurrent_number: 100
      during_time: 2h
      interval: 20
    concurrent_tasks:
      - type: search
        weight: 1
        params:
          nq: 100
          top_k: 10
          search_param:
            nprobe: 64
          timeout: 60
          random_data: true
yanliang567 commented 1 year ago

/unassign

weiliu1031 commented 1 year ago

image based on the logs, we found that, many connect operation in grpcclient meets a context canceled, which means the connect operation timeout before finished. which may have two effect in search progress:

  1. the search request will failed, due to get grpc client failed. which will return to user level
  2. the background check qn health request failed, which will set qn to unreachable, make search request has no qn to execute, return a all available nodes are unreachable: no available node to user level

fix: increase the value of grpc client dial timeout

xiaofan-luan commented 1 year ago

image based on the logs, we found that, many connect operation in grpcclient meets a context canceled, which means the connect operation timeout before finished. which may have two effect in search progress:

  1. the search request will failed, due to get grpc client failed. which will return to user level
  2. the background check qn health request failed, which will set qn to unreachable, make search request has no qn to execute, return a all available nodes are unreachable: no available node to user level

fix: increase the value of grpc client dial timeout

increase dial timeout may not help on failure cases? on the other side milvus should retry when connect timeout?

xiaofan-luan commented 1 year ago

and we believe if only a few failure on chaos it's fine. we should more focus on recovery time

xiaofan-luan commented 1 year ago

set the timeout to a reasonable value - for instance 10s and the system should be recovered in 30s should be the target we are looking for.

wangting0128 commented 1 year ago

Recurrence

image: master-20230718-73512c72 server argo task: fouramf-9sc5q clients argo task: fouramf-concurrent-wwn8f

server:

NAME                                                              READY   STATUS                   RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lb-helm-ivfsq8-scene8-high-etcd-0                                 1/1     Running                  0                23h     10.104.23.119   4am-node27   <none>           <none>
lb-helm-ivfsq8-scene8-high-etcd-1                                 1/1     Running                  0                23h     10.104.17.72    4am-node23   <none>           <none>
lb-helm-ivfsq8-scene8-high-etcd-2                                 1/1     Running                  0                23h     10.104.16.34    4am-node21   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-datacoord-86f76c7746-gf9bl      1/1     Running                  0                23h     10.104.17.70    4am-node23   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-datanode-6f74bcfd6d-b56w5       1/1     Running                  1 (23h ago)      23h     10.104.23.113   4am-node27   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-datanode-6f74bcfd6d-pdvlh       1/1     Running                  1 (23h ago)      23h     10.104.6.189    4am-node13   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-indexcoord-6ccc6c95d-rvb9s      1/1     Running                  0                23h     10.104.17.69    4am-node23   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-indexnode-554d486964-5x8cj      1/1     Running                  0                23h     10.104.21.243   4am-node24   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-indexnode-554d486964-v8l4f      1/1     Running                  0                23h     10.104.4.106    4am-node11   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-proxy-5f95dd46cb-9lg6d          1/1     Running                  1 (23h ago)      23h     10.104.6.187    4am-node13   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-querycoord-7cc86df6b9-n25pq     1/1     Running                  1 (23h ago)      23h     10.104.21.242   4am-node24   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-querynode-686894ccbf-dfc8s      1/1     Running                  0                23h     10.104.16.28    4am-node21   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-querynode-686894ccbf-qxbfh      1/1     Running                  0                23h     10.104.6.190    4am-node13   <none>           <none>
lb-helm-ivfsq8-scene8-high-milvus-rootcoord-67499dcd65-fp2t2      1/1     Running                  1 (23h ago)      23h     10.104.6.185    4am-node13   <none>           <none>
lb-helm-ivfsq8-scene8-high-minio-0                                1/1     Running                  0                23h     10.104.23.118   4am-node27   <none>           <none>
lb-helm-ivfsq8-scene8-high-minio-1                                1/1     Running                  0                23h     10.104.16.30    4am-node21   <none>           <none>
lb-helm-ivfsq8-scene8-high-minio-2                                1/1     Running                  0                23h     10.104.17.73    4am-node23   <none>           <none>
lb-helm-ivfsq8-scene8-high-minio-3                                1/1     Running                  0                23h     10.104.20.129   4am-node22   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-bookie-0                        1/1     Running                  0                23h     10.104.23.123   4am-node27   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-bookie-1                        1/1     Running                  0                23h     10.104.17.76    4am-node23   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-bookie-2                        1/1     Running                  0                23h     10.104.20.138   4am-node22   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-bookie-init-x8t7w               0/1     Completed                0                23h     10.104.17.66    4am-node23   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-broker-0                        1/1     Running                  0                23h     10.104.17.67    4am-node23   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-proxy-0                         1/1     Running                  0                23h     10.104.6.188    4am-node13   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-pulsar-init-q25k4               0/1     Completed                0                23h     10.104.6.186    4am-node13   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-recovery-0                      1/1     Running                  0                23h     10.104.23.114   4am-node27   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-zookeeper-0                     1/1     Running                  0                23h     10.104.23.120   4am-node27   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-zookeeper-1                     1/1     Running                  0                23h     10.104.17.79    4am-node23   <none>           <none>
lb-helm-ivfsq8-scene8-high-pulsar-zookeeper-2                     1/1     Running                  0                23h     10.104.4.108    4am-node11   <none>           <none>

proxy log:

截屏2023-07-19 14 17 29

clients log:

截屏2023-07-19 14 18 16 截屏2023-07-19 14 19 03
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

wangting0128 commented 1 year ago

Recurrent

image: master-20230822-9131a0aa argo task: fouramf-server-client-concurrent-r86xj

server:

NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lb-helm-multi-ivfsq8-etcd-0                                       1/1     Running     3 (2m41s ago)   7m3s    10.104.12.179   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-etcd-1                                       1/1     Running     3 (2m34s ago)   7m3s    10.104.18.75    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-etcd-2                                       1/1     Running     0               7m2s    10.104.13.101   4am-node16   <none>           <none>
lb-helm-multi-ivfsq8-milvus-datacoord-9787dfd7d-55wdw             1/1     Running     1 (3m2s ago)    7m3s    10.104.12.171   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-milvus-datanode-7fc65867b8-cgm7t             1/1     Running     1 (3m1s ago)    7m3s    10.104.12.173   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-milvus-datanode-7fc65867b8-vzg6s             1/1     Running     1 (3m1s ago)    7m3s    10.104.18.61    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-milvus-indexcoord-7b9c6d5cd6-gghpf           1/1     Running     0               7m3s    10.104.18.58    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-milvus-indexnode-8649d6dd7d-rc4xh            1/1     Running     1 (3m2s ago)    7m3s    10.104.24.89    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-milvus-indexnode-8649d6dd7d-zcgr8            1/1     Running     0               7m3s    10.104.13.86    4am-node16   <none>           <none>
lb-helm-multi-ivfsq8-milvus-proxy-6f55cd4cc4-jk8cq                1/1     Running     1 (3m2s ago)    7m3s    10.104.24.87    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-milvus-querycoord-bb89c9677-bfv59            1/1     Running     1 (3m2s ago)    7m3s    10.104.24.88    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-milvus-querynode-5cf5444bd7-mk64m            1/1     Running     1 (3m2s ago)    7m2s    10.104.24.90    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-milvus-querynode-5cf5444bd7-pmq6b            1/1     Running     1 (3m ago)      7m2s    10.104.12.174   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-milvus-rootcoord-7b48c986c9-5gw4g            1/1     Running     1 (3m2s ago)    7m3s    10.104.24.86    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-minio-0                                      1/1     Running     0               7m3s    10.104.13.100   4am-node16   <none>           <none>
lb-helm-multi-ivfsq8-minio-1                                      1/1     Running     0               7m2s    10.104.24.92    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-minio-2                                      1/1     Running     0               7m2s    10.104.18.85    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-minio-3                                      1/1     Running     0               7m2s    10.104.5.68     4am-node12   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-bookie-0                              1/1     Running     0               7m3s    10.104.12.181   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-bookie-1                              1/1     Running     0               7m2s    10.104.13.104   4am-node16   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-bookie-2                              1/1     Running     0               7m2s    10.104.18.84    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-bookie-init-p2cdn                     0/1     Completed   0               7m3s    10.104.12.169   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-broker-0                              1/1     Running     0               7m3s    10.104.18.59    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-proxy-0                               1/1     Running     0               7m3s    10.104.12.172   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-pulsar-init-x4dw6                     0/1     Completed   0               7m3s    10.104.12.170   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-recovery-0                            1/1     Running     0               7m3s    10.104.18.60    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-0                           1/1     Running     0               7m3s    10.104.12.180   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-1                           1/1     Running     0               6m13s   10.104.18.96    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-2                           1/1     Running     0               3m23s   10.104.4.199    4am-node11   <none>           <none>
NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
lb-helm-multi-ivfsq8-etcd-0                                       1/1     Running     3 (3h16m ago)   3h20m   10.104.12.179   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-etcd-1                                       1/1     Running     3 (3h16m ago)   3h20m   10.104.18.75    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-etcd-2                                       1/1     Running     1 (31m ago)     3h20m   10.104.13.101   4am-node16   <none>           <none>
lb-helm-multi-ivfsq8-milvus-datacoord-9787dfd7d-55wdw             1/1     Running     1 (3h16m ago)   3h20m   10.104.12.171   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-milvus-datanode-7fc65867b8-cgm7t             1/1     Running     1 (3h16m ago)   3h20m   10.104.12.173   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-milvus-datanode-7fc65867b8-vzg6s             1/1     Running     1 (3h16m ago)   3h20m   10.104.18.61    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-milvus-indexcoord-7b9c6d5cd6-gghpf           1/1     Running     0               3h20m   10.104.18.58    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-milvus-indexnode-8649d6dd7d-rc4xh            1/1     Running     1 (3h16m ago)   3h20m   10.104.24.89    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-milvus-indexnode-8649d6dd7d-zcgr8            1/1     Running     0               3h20m   10.104.13.86    4am-node16   <none>           <none>
lb-helm-multi-ivfsq8-milvus-proxy-6f55cd4cc4-jk8cq                1/1     Running     1 (3h16m ago)   3h20m   10.104.24.87    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-milvus-querycoord-bb89c9677-bfv59            1/1     Running     1 (3h16m ago)   3h20m   10.104.24.88    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-milvus-querynode-5cf5444bd7-mk64m            1/1     Running     1 (3h16m ago)   3h20m   10.104.24.90    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-milvus-querynode-5cf5444bd7-pmq6b            1/1     Running     1 (3h16m ago)   3h20m   10.104.12.174   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-milvus-rootcoord-7b48c986c9-5gw4g            1/1     Running     1 (3h16m ago)   3h20m   10.104.24.86    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-minio-0                                      1/1     Running     0               3h20m   10.104.13.100   4am-node16   <none>           <none>
lb-helm-multi-ivfsq8-minio-1                                      1/1     Running     0               3h20m   10.104.24.92    4am-node29   <none>           <none>
lb-helm-multi-ivfsq8-minio-2                                      1/1     Running     0               3h20m   10.104.18.85    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-minio-3                                      1/1     Running     0               3h20m   10.104.5.68     4am-node12   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-bookie-0                              1/1     Running     0               3h20m   10.104.12.181   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-bookie-1                              1/1     Running     0               3h20m   10.104.13.104   4am-node16   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-bookie-2                              1/1     Running     0               3h20m   10.104.18.84    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-bookie-init-p2cdn                     0/1     Completed   0               3h20m   10.104.12.169   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-broker-0                              1/1     Running     0               3h20m   10.104.18.59    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-proxy-0                               1/1     Running     0               3h20m   10.104.12.172   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-pulsar-init-x4dw6                     0/1     Completed   0               3h20m   10.104.12.170   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-recovery-0                            1/1     Running     0               3h20m   10.104.18.60    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-0                           1/1     Running     0               3h20m   10.104.12.180   4am-node17   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-1                           1/1     Running     0               3h19m   10.104.18.96    4am-node25   <none>           <none>
lb-helm-multi-ivfsq8-pulsar-zookeeper-2                           1/1     Running     0               3h17m   10.104.4.199    4am-node11   <none>           <none>
截屏2023-08-23 10 57 40

clients log: clients.log

截屏2023-08-23 10 56 08
weiliu1031 commented 1 year ago

for now, if test nq=1000, concurrent=50, timeout=60s, it's easily timeout, cause the workload is really heavy. we recommand to increase the cpu resource or timeout param.

more details: the avg queue latency reach 15s, which means the querynode can't handle the workload, there always has some task waiting in queue. image

and the search latency has reach 60s in sometimes, which means some requests will got a timeout error image

wangting0128 commented 1 year ago
截屏2023-08-23 11 08 41

for now, if test nq=1000, concurrent=50, timeout=60s, it's easily timeout, cause the workload is really heavy. we recommand to increase the cpu resource or timeout param.

wangting0128 commented 1 year ago

verification passed

Increase the search timeout from 60s to 120s image: master-20230823-148446cf argo task: fouramf-server-client-concurrent-ivfsq8

clients:

截屏2023-08-23 15 07 45 截屏2023-08-23 15 08 09

server:

截屏2023-08-23 15 08 26 截屏2023-08-23 15 08 48 截屏2023-08-23 15 09 15