milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.04k stars 2.88k forks source link

[Bug]: [benchmark][cluster] When inserting and querying, the response time of the interface is getting longer and longer #15583

Closed wangting0128 closed 2 years ago

wangting0128 commented 2 years ago

Is there an existing issue for this?

Environment

- Milvus version: master-20220213-f74a9c2
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 2.0.1.dev1
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

client pod: benchmark-tag-xwtx2-71121370

client data: [1644808606]

截屏2022-02-15 17 03 24 截屏2022-02-15 17 03 37 截屏2022-02-15 17 03 53 截屏2022-02-15 17 04 04

Expected Behavior

argo task: benchmark-tag-xwtx2

test yaml: client-configmap:client-random-locust-1m server-configmap:server-cluster-8c32m

server:

NAME                                                         READY   STATUS      RESTARTS   AGE    IP             NODE                      NOMINATED NODE   READINESS GATES
benchmark-tag-xwtx2-1-etcd-0                                 1/1     Running     0          10h    10.97.4.172    qa-node002.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-etcd-1                                 1/1     Running     0          10h    10.97.6.93     qa-node004.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-etcd-2                                 1/1     Running     0          10h    10.97.11.16    qa-node009.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-milvus-datacoord-754c77cc57-ptkfb      1/1     Running     0          10h    10.97.14.29    qa-node011.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-milvus-datanode-f96fff74d-tgq6q        1/1     Running     0          10h    10.97.20.194   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-milvus-indexcoord-7f9d885496-48vwg     1/1     Running     0          10h    10.97.11.15    qa-node009.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-milvus-indexnode-76bfd9df4f-fmqrt      1/1     Running     0          10h    10.97.20.193   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-milvus-proxy-6445b58ccd-7hvrt          1/1     Running     0          10h    10.97.11.11    qa-node009.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-milvus-querycoord-c7c94f447-7jh4l      1/1     Running     0          10h    10.97.11.9     qa-node009.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-milvus-querynode-5df995f999-6plx4      1/1     Running     0          10h    10.97.14.30    qa-node011.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-milvus-rootcoord-6875dbd55c-fjq84      1/1     Running     0          10h    10.97.11.8     qa-node009.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-minio-0                                1/1     Running     0          10h    10.97.20.200   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-minio-1                                1/1     Running     0          10h    10.97.20.204   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-minio-2                                1/1     Running     0          10h    10.97.20.201   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-minio-3                                1/1     Running     0          10h    10.97.20.202   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-autorecovery-685647bbf8-4d5js   1/1     Running     0          10h    10.97.11.12    qa-node009.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-bastion-7bcffcc7d8-csmcn        1/1     Running     0          10h    10.97.11.13    qa-node009.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-bookkeeper-0                    1/1     Running     0          10h    10.97.10.51    qa-node008.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-bookkeeper-1                    1/1     Running     0          10h    10.97.20.205   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-bookkeeper-2                    1/1     Running     0          10h    10.97.20.207   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-broker-656567c569-6vp4p         1/1     Running     0          10h    10.97.11.14    qa-node009.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-proxy-d799f59f-s4q9m            2/2     Running     0          10h    10.97.20.195   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-zookeeper-0                     1/1     Running     0          10h    10.97.4.171    qa-node002.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-zookeeper-1                     1/1     Running     0          10h    10.97.4.173    qa-node002.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-zookeeper-2                     1/1     Running     0          10h    10.97.4.174    qa-node002.zilliz.local   <none>           <none>
benchmark-tag-xwtx2-1-pulsar-zookeeper-metadata-v958s        0/1     Completed   0          10h    10.97.11.10    qa-node009.zilliz.local   <none>           <none>

Steps To Reproduce

1. create collection
2. build index of ivf_sq8
3. insert 1 million vectors
4. flush collection
5. build index with the same params
6. load colllection
7. locust concurrent: query<-search, insert, get<-query, load

Anything else?

client-random-locust-1m:

locust_random_performance:
      collections:
        -
          collection_name: sift_1m_128_l2
          ni_per: 50000
          build_index: true
          index_type: ivf_sq8
          index_param:
            nlist: 1024
          task:
            types:
              -
                type: query
                weight: 20
                params:
                  top_k: 10
                  nq: 10
                  search_param:
                    nprobe: 16
              -
                type: insert
                weight: 10
                params:
                  ni_per: 500
              -
                type: load
                weight: 1
              -
                type: get
                weight: 2
                params:
                  ids_length: 10
            connection_num: 1
            clients_num: 20
            spawn_rate: 2
            during_time: 10h
xiaofan-luan commented 2 years ago

duplicate with #16041

xiaofan-luan commented 2 years ago

might be no duplicated, but the reason of growing segment, keep it open

yanliang567 commented 2 years ago

/assign @longjiquan

@wangting0128 please retry with latest master

wangting0128 commented 2 years ago

argo task: benchmark-backup-jqlbs

test yaml: client-configmap:client-random-locust-1m server-configmap:server-cluster-8c32m

server:

NAME                                                          READY   STATUS      RESTARTS   AGE   IP             NODE                      NOMINATED NODE   READINESS GATES
benchmark-backup-jqlbs-1-etcd-0                               1/1     Running     0          17m   10.97.16.116   qa-node013.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-etcd-1                               1/1     Running     0          17m   10.97.17.47    qa-node014.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-etcd-2                               1/1     Running     0          17m   10.97.16.118   qa-node013.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-milvus-datacoord-5fbc47f5df-nv86g    1/1     Running     0          17m   10.97.3.79     qa-node001.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-milvus-datanode-779ff54b6d-2z2tc     1/1     Running     0          17m   10.97.17.44    qa-node014.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-milvus-indexcoord-65c59c474b-wxjfq   1/1     Running     0          17m   10.97.10.155   qa-node008.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-milvus-indexnode-7d9c9bcd59-gwlzq    1/1     Running     0          17m   10.97.16.114   qa-node013.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-milvus-proxy-5f7d55dc56-p2lm8        1/1     Running     0          17m   10.97.10.154   qa-node008.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-milvus-querycoord-869f5c44b9-ptjbb   1/1     Running     0          17m   10.97.10.153   qa-node008.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-milvus-querynode-66f99d565c-ksktr    1/1     Running     0          17m   10.97.20.154   qa-node018.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-milvus-rootcoord-589ffffffc-5277j    1/1     Running     0          17m   10.97.10.152   qa-node008.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-minio-0                              1/1     Running     0          17m   10.97.12.62    qa-node015.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-minio-1                              1/1     Running     0          17m   10.97.19.64    qa-node016.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-minio-2                              1/1     Running     0          17m   10.97.19.62    qa-node016.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-minio-3                              1/1     Running     0          17m   10.97.12.64    qa-node015.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-bookie-0                      1/1     Running     0          17m   10.97.5.64     qa-node003.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-bookie-1                      1/1     Running     0          17m   10.97.19.68    qa-node016.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-bookie-2                      1/1     Running     0          17m   10.97.18.220   qa-node017.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-bookie-init-dw65t             0/1     Completed   0          17m   10.97.3.78     qa-node001.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-broker-0                      1/1     Running     0          17m   10.97.3.77     qa-node001.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-proxy-0                       1/1     Running     0          17m   10.97.9.32     qa-node007.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-pulsar-init-h4tdf             0/1     Completed   0          17m   10.97.9.31     qa-node007.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-recovery-0                    1/1     Running     0          17m   10.97.12.60    qa-node015.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-zookeeper-0                   1/1     Running     0          17m   10.97.9.34     qa-node007.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-zookeeper-1                   1/1     Running     0          16m   10.97.11.232   qa-node009.zilliz.local   <none>           <none>
benchmark-backup-jqlbs-1-pulsar-zookeeper-2                   1/1     Running     0          16m   10.97.16.120   qa-node013.zilliz.local   <none>           <none>

client pod: benchmark-backup-jqlbs-22284265

client data: [1653037552]

截屏2022-05-20 15 54 55 截屏2022-05-20 15 55 07 截屏2022-05-20 15 55 22 截屏2022-05-20 15 55 36
longjiquan commented 2 years ago

@wangting0128 Multithreading is not well-supported by Python GRPC, maybe you can try using multi-process.

longjiquan commented 2 years ago

The insert latency in Proxy side was stable as below: insert-proxy

longjiquan commented 2 years ago

I also tested this case using go-sdk, also stable. gosdk-insert gosdk-search

longjiquan commented 2 years ago

According to https://github.com/grpc/grpc/issues/20985, operations on same gRPC connection will share GIL, thus will be influenced by each other. So the latency of other API, such as insert and load_collection here will follow as search's latency.

xiaofan-luan commented 2 years ago

so this might not be a issue for python SDK? close for now? @longjiquan

longjiquan commented 2 years ago

so this might not be a issue for python SDK? close for now? @longjiquan

Yes, not an issue for python SDK. Could help to check this? @wangting0128 /assign @wangting0128