milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.49k stars 2.83k forks source link

[Bug]: [benchmark][cluster] milvus downtime upgrade, query slowdown #20921

Closed elstic closed 1 year ago

elstic commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.0-20221130-c5f215da
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.0.dev72
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server-instance fouram-tag-no-clean-cbtcd-1 server-configmap server-cluster-8c64m client-configmap client-random-locust-search-filter-100m-ddl

argo task: fouram-upgrade-server-xd8f4

before upgrade: 2.2.0-20221129-e2dd7a41 after upgrade:2.2.0-20221130-c5f215da

server

fouram-tag-no-clean-cbtcd-1-etcd-0                               1/1     Running     0                 3m35s   10.104.6.243   4am-node13   <none>           <none>
fouram-tag-no-clean-cbtcd-1-etcd-1                               1/1     Running     0                 4m38s   10.104.5.101   4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-etcd-2                               1/1     Running     0                 5m41s   10.104.4.109   4am-node11   <none>           <none>
fouram-tag-no-clean-cbtcd-1-milvus-datacoord-85bb59b457-4wjnn    1/1     Running     0                 5m43s   10.104.6.240   4am-node13   <none>           <none>
fouram-tag-no-clean-cbtcd-1-milvus-datanode-c6575fcb-r87b5       1/1     Running     0                 5m44s   10.104.4.108   4am-node11   <none>           <none>
fouram-tag-no-clean-cbtcd-1-milvus-indexcoord-54889998f6-6dgn5   1/1     Running     0                 5m43s   10.104.6.241   4am-node13   <none>           <none>
fouram-tag-no-clean-cbtcd-1-milvus-indexnode-84d98d6c97-lfbc4    1/1     Running     0                 5m44s   10.104.5.103   4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-milvus-proxy-586cd5f5c8-s6sgg        1/1     Running     0                 5m44s   10.104.6.239   4am-node13   <none>           <none>
fouram-tag-no-clean-cbtcd-1-milvus-querycoord-76754fb54c-lhktw   1/1     Running     0                 5m43s   10.104.5.100   4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-milvus-querynode-7b694c88f9-mll5b    1/1     Running     0                 5m43s   10.104.5.102   4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-milvus-rootcoord-7c9589849-xc5zm     1/1     Running     0                 5m42s   10.104.6.242   4am-node13   <none>           <none>
fouram-tag-no-clean-cbtcd-1-minio-0                              1/1     Running     0                 24h     10.104.1.244   4am-node10   <none>           <none>
fouram-tag-no-clean-cbtcd-1-minio-1                              1/1     Running     0                 24h     10.104.5.55    4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-minio-2                              1/1     Running     0                 24h     10.104.6.195   4am-node13   <none>           <none>
fouram-tag-no-clean-cbtcd-1-minio-3                              1/1     Running     0                 24h     10.104.4.58    4am-node11   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-bookie-0                      1/1     Running     0                 24h     10.104.6.193   4am-node13   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-bookie-1                      1/1     Running     0                 24h     10.104.5.54    4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-bookie-2                      1/1     Running     0                 24h     10.104.9.139   4am-node14   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-broker-0                      1/1     Running     0                 24h     10.104.5.52    4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-proxy-0                       1/1     Running     0                 24h     10.104.5.47    4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-recovery-0                    1/1     Running     0                 24h     10.104.9.134   4am-node14   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-zookeeper-0                   1/1     Running     0                 24h     10.104.5.46    4am-node12   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-zookeeper-1                   1/1     Running     0                 24h     10.104.6.196   4am-node13   <none>           <none>
fouram-tag-no-clean-cbtcd-1-pulsar-zookeeper-2                   1/1     Running     0                 24h     10.104.4.60    4am-node11   <none>           <none>

client log:

[2022-11-30 14:54:06,265] [   DEBUG] - Milvus query run in 0.4986s (milvus_benchmark.client:57)
[2022-11-30 14:54:06,265] [   DEBUG] - Milvus query run in 0.8033s (milvus_benchmark.client:57)
[2022-11-30 14:54:06,662] [    INFO] - Create collection: <scene_test_6719_494861> successfully (milvus_benchmark.client:158)
[2022-11-30 14:54:06,662] [   DEBUG] - Milvus create_collection run in 0.7043s (milvus_benchmark.client:57)
[2022-11-30 14:54:06,708] [   DEBUG] - Milvus query run in 0.9422s (milvus_benchmark.client:57)
[2022-11-30 14:54:06,709] [   DEBUG] - Milvus query run in 0.7551s (milvus_benchmark.client:57)
[2022-11-30 14:54:06,710] [   DEBUG] - Milvus query run in 0.7519s (milvus_benchmark.client:57)
[2022-11-30 14:54:06,710] [   DEBUG] - Milvus query run in 0.7556s (milvus_benchmark.client:57)
[2022-11-30 14:54:07,121] [   DEBUG] - Milvus insert run in 0.857s (milvus_benchmark.client:57)
[2022-11-30 14:54:07,122] [   DEBUG] - [scene_test] Start flush : scene_test_7116_361775 (milvus_benchmark.client:643)
[2022-11-30 14:54:07,122] [   DEBUG] - Milvus query run in 0.8567s (milvus_benchmark.client:57)
[2022-11-30 14:54:07,123] [   DEBUG] - Milvus query run in 1.1639s (milvus_benchmark.client:57)
[2022-11-30 14:54:07,123] [   DEBUG] - Milvus query run in 1.1641s (milvus_benchmark.client:57)
[2022-11-30 14:54:07,123] [   DEBUG] - Milvus query run in 1.1648s (milvus_benchmark.client:57)
[2022-11-30 14:54:07,124] [   DEBUG] - [scene_test] Start scene test : scene_test_6568_324617 (milvus_benchmark.client:634)
[2022-11-30 14:54:07,438] [   DEBUG] - Milvus query run in 0.7274s (milvus_benchmark.client:57)
[2022-11-30 14:54:07,439] [   DEBUG] - Milvus query run in 1.1733s (milvus_benchmark.client:57)
[2022-11-30 14:54:07,650] [   DEBUG] - Milvus get run in 0.5265s (milvus_benchmark.client:57)

grafana: image image

problem: Delay should not increase after the initial decrease

Expected Behavior

Steps To Reproduce

1. milvus restart upgrade
2.create an collection
3.build index of ivf_sq8
4.insert 100m data
5.build index again
6.search ,query, load, scene_test  ===>   query slows down

Milvus Log

No response

Anything else?

client-random-locust-search-filter-100m-ddl

    locust_random_concurrent_performance:
      collections:
        -
          collection_name: sift_100m_128_l2
          # collection_name: sift_1m_128_l2
          other_fields: float1
          ni_per: 50000
          build_index: true
          index_type: ivf_sq8
          index_param:
            nlist: 2048
          task:
            types:
              -
                type: query
                weight: 20
                params:
                  top_k: 10
                  nq: 10
                  search_param:
                    nprobe: 16
                  filters:
                    -
                      range: "{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}"
              -
                type: load
                weight: 1
              -
                type: get
                weight: 10
                params:
                  ids_length: 10
              -
                type: scene_test
                weight: 2
            connection_num: 1
            clients_num: 20
            spawn_rate: 2
            # during_time: 60h
            during_time: 12h
yanliang567 commented 1 year ago

/assign @jiaoew1991 /unassign

bigsheeper commented 1 year ago

image

in queue latency is quite high.

elstic commented 1 year ago

After checking, the latency of this scene is normal