milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Bug]: [benchmark][cluster] After Milvus upgrade, Milvus search and create small collection insert, raise an error: "rpc deadline exceeded: Retry timeout: 300s" #21410

Closed elstic closed 1 year ago

elstic commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.0-20221227-cfab3e40
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):  kafka  
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.1.dev4
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

before upgrade image:v2.2.2 server-instance fouram-tag-no-clean-h75vr-1 server-configmap server-cluster-8c64m-querynode2-kafka client-configmap client-random-locust-search-filter-100m-ddl-replica2-con

after upgrade image:2.2.0-20221227-cfab3e40 server-instance fouram-tag-no-clean-h75vr-1 server-configmap server-cluster-8c64m-querynode2-kafka client-configmap client-random-locust-search-filter-100m-ddl-replica2-con

fouram-tag-no-clean-h75vr-1-etcd-0                               1/1     Running     0               6m53s   10.104.6.66    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-etcd-1                               1/1     Running     0               6m53s   10.104.5.102   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-etcd-2                               1/1     Running     0               6m53s   10.104.4.130   4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-0                              2/2     Running     1 (6m20s ago)   6m53s   10.104.5.104   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-1                              2/2     Running     0               6m32s   10.104.9.67    4am-node14   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-2                              2/2     Running     0               6m32s   10.104.1.42    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-exporter-5478c56c56-lpqq9      1/1     Running     4 (6m3s ago)    6m53s   10.104.1.38    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-datacoord-6f6d486bb5-v5hcw    1/1     Running     0               6m53s   10.104.5.97    4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-datanode-5f86c7fb9b-vnjp9     1/1     Running     1 (2m52s ago)   6m53s   10.104.5.96    4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-indexcoord-ff4df8644-j6c2f    1/1     Running     1 (2m51s ago)   6m53s   10.104.5.98    4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-indexnode-66ff647f5c-jlmrf    1/1     Running     0               6m53s   10.104.5.94    4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-proxy-5999fd5456-x5cjl        1/1     Running     1 (2m51s ago)   6m53s   10.104.5.99    4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querycoord-56566bbb47-z7hpm   1/1     Running     1 (2m51s ago)   6m53s   10.104.5.100   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querynode-7d94b48d79-dcg2t    1/1     Running     0               6m53s   10.104.1.40    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querynode-7d94b48d79-zlsbb    1/1     Running     0               6m53s   10.104.6.68    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-rootcoord-79d67f84f9-6ws4c    1/1     Running     1 (2m52s ago)   6m53s   10.104.5.95    4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-0                              1/1     Running     0               6m53s   10.104.5.101   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-1                              1/1     Running     0               6m53s   10.104.6.69    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-2                              1/1     Running     0               6m53s   10.104.1.39    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-3                              1/1     Running     0               6m53s   10.104.4.129   4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-0                          1/1     Running     0               6m53s   10.104.5.103   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-1                          1/1     Running     0               6m53s   10.104.9.66    4am-node14   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-2                          1/1     Running     0               6m53s   10.104.1.41    4am-node10   <none>           <none>

client pod:fouram-restart-server-28dfh-2533818456 client log image

Expected Behavior

No response

Steps To Reproduce

1. create a collection
2. build ivf sq8 index
3. insert 100m data
4. load search ,query scene_test
5. upgrade image 
6. sleep 60m
7. search, load, query, scene_test ==> raise an error

Milvus Log

No response

Anything else?

client-random-locust-search-filter-100m-ddl-replica2-con

    locust_random_concurrent_performance:
      collections:
        -
          collection_name: sift_100m_128_l2
          other_fields: float1
          ni_per: 50000
          build_index: true
          index_type: ivf_sq8
          index_param:
            nlist: 2048
          load_param:
            replica_number: 2
          task:
            types:
              -
                type: query
                weight: 20
                params:
                  top_k: 10
                  nq: 10
                  search_param:
                    nprobe: 16
                  filters:
                    -
                      range: "{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}"
              -
                type: load
                weight: 1
                params:
                  replica_number: 2
              -
                type: get
                weight: 10
                params:
                  ids_length: 10
              -
                type: scene_test
                weight: 2
            connection_num: 1
            clients_num: 50
            spawn_rate: 2
            during_time: 12h
yanliang567 commented 1 year ago

/assign @jiaoew1991 /unassign

elstic commented 1 year ago

before upgrade image: 2.2.0-20221227-cfab3e40 after upgrade image: 2.2.0-20221228-d7510444

argo task: fouram-upgrade-server-c9jd9 server-instance fouram-tag-no-clean-h75vr-1 server-configmap server-cluster-8c64m-querynode2-kafka client-configmap client-random-locust-search-filter-100m-ddl-replica2-con

server:

fouram-tag-no-clean-h75vr-1-etcd-0                               1/1     Running     0               78s     10.104.6.58    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-etcd-1                               1/1     Running     0               2m31s   10.104.5.161   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-etcd-2                               1/1     Running     0               3m34s   10.104.4.101   4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-0                              2/2     Running     1 (23h ago)     23h     10.104.5.104   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-1                              2/2     Running     0               23h     10.104.9.67    4am-node14   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-2                              2/2     Running     0               23h     10.104.1.42    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-exporter-5478c56c56-lpqq9      1/1     Running     4 (23h ago)     23h     10.104.1.38    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-datacoord-7cd44d6c5f-trcrt    1/1     Running     0               3m33s   10.104.6.56    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-datanode-768cbc786b-xb9dg     1/1     Running     0               3m36s   10.104.5.157   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-indexcoord-54bc4867d-lpj97    1/1     Running     0               3m35s   10.104.5.159   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-indexnode-697665cddc-d57cz    1/1     Running     0               3m36s   10.104.5.158   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-proxy-5cc55c847d-w8mqr        1/1     Running     0               3m36s   10.104.1.63    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querycoord-76955ffbb9-j22wv   1/1     Running     0               3m34s   10.104.6.55    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querynode-6997f7765-96s5m     1/1     Running     0               3m36s   10.104.1.64    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querynode-6997f7765-zb5px     1/1     Running     0               2m      10.104.6.57    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-rootcoord-79d6499df7-9vmw8    1/1     Running     0               3m32s   10.104.5.160   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-0                              1/1     Running     0               23h     10.104.5.101   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-1                              1/1     Running     0               23h     10.104.6.69    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-2                              1/1     Running     0               23h     10.104.1.39    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-3                              1/1     Running     0               23h     10.104.4.129   4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-0                          1/1     Running     0               23h     10.104.5.103   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-1                          1/1     Running     0               23h     10.104.9.66    4am-node14   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-2                          1/1     Running     0               23h     10.104.1.41    4am-node10   <none>           <none>

client log: image

elstic commented 1 year ago

before upgrade image: 2.2.0-20221229-ea89a5d8 after upgrade image: 2.2.0-20221230-b684d4ad

argo task:fouram-upgrade-server-nkv8d server-instance fouram-tag-no-clean-h75vr-1 server-configmap server-cluster-8c64m-querynode2-kafka client-configmap client-random-locust-search-filter-100m-ddl-replica2-con

server

fouram-tag-no-clean-h75vr-1-etcd-0                               1/1     Running     0               2d      10.104.6.58    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-etcd-1                               1/1     Running     0               2d      10.104.5.161   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-etcd-2                               1/1     Running     0               2d      10.104.4.101   4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-0                              2/2     Running     1 (3d ago)      3d      10.104.5.104   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-1                              2/2     Running     0               3d      10.104.9.67    4am-node14   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-2                              2/2     Running     0               3d      10.104.1.42    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-kafka-exporter-5478c56c56-lpqq9      1/1     Running     4 (3d ago)      3d      10.104.1.38    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-datacoord-54cfc5df96-qtmzh    1/1     Running     0               3m28s   10.104.4.31    4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-datanode-786f94db69-bfzr5     1/1     Running     0               3m30s   10.104.6.32    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-indexcoord-bcfdc9cdf-w4skd    1/1     Running     0               3m27s   10.104.4.32    4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-indexnode-586794f78-2lf2w     1/1     Running     0               3m30s   10.104.5.110   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-proxy-5d79965895-wwd4c        1/1     Running     0               3m30s   10.104.5.109   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querycoord-687f8c4cd6-tckvz   1/1     Running     0               3m29s   10.104.4.30    4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querynode-5f74bb58cb-8qmf8    1/1     Running     0               109s    10.104.4.34    4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-querynode-5f74bb58cb-vxsnm    1/1     Running     0               3m30s   10.104.5.111   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-milvus-rootcoord-76df87c64b-cpxmz    1/1     Running     0               3m25s   10.104.4.33    4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-0                              1/1     Running     0               3d      10.104.5.101   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-1                              1/1     Running     0               3d      10.104.6.69    4am-node13   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-2                              1/1     Running     0               3d      10.104.1.39    4am-node10   <none>           <none>
fouram-tag-no-clean-h75vr-1-minio-3                              1/1     Running     0               3d      10.104.4.129   4am-node11   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-0                          1/1     Running     0               3d      10.104.5.103   4am-node12   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-1                          1/1     Running     0               3d      10.104.9.66    4am-node14   <none>           <none>
fouram-tag-no-clean-h75vr-1-zookeeper-2                          1/1     Running     0               3d      10.104.1.41    4am-node10   <none>           <none>

client image

jiaoew1991 commented 1 year ago

It seems that the node where the Kafka pod is located has insufficient disk space @elstic

image
elstic commented 1 year ago

@LoveEachDay Can you help with this, we are not actively setting the disk size for kafka

LoveEachDay commented 1 year ago

@LoveEachDay Can you help with this, we are not actively setting the disk size for kafka

@elstic From the above log, it shows that kafka runs out of disk space. By default, kafka cluster is provisioned with 300G for each of three brokers.

You can increase the pvc storage for kafka.

elstic commented 1 year ago

@jingkl does this problem still exist?

jingkl commented 1 year ago

This issue does not exist, I close the issue