milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.49k stars 2.83k forks source link

[Bug]: [benchmark][cluster] Milvus build ivf_sq8 index after search failed, raise an error"collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>'" #20761

Closed elstic closed 1 year ago

elstic commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.2.0-20221121-efa1cf7f 
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.0.dev72
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server-instance
fouram-tag-no-clean-c8d8n-1
server-configmap
server-cluster-8c64m-querynode2
client-configmap
client-random-locust-concurrent-replica2-search-100m-ddl

server:

fouram-tag-no-clean-c8d8n-1-etcd-0                               1/1     Running    0               7d1h    10.104.1.77    4am-node10   <none>           <none>
fouram-tag-no-clean-c8d8n-1-etcd-1                               1/1     Running    0               7d1h    10.104.9.11    4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-etcd-2                               1/1     Running    0               7d1h    10.104.4.153   4am-node11   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-datacoord-7c95759665-5nhdt    1/1     Running    0               3m11s   10.104.9.162   4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-datanode-789579c7c5-tjs7n     1/1     Running    0               3m12s   10.104.4.222   4am-node11   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-indexcoord-ddf974999-kg978    1/1     Running    0               3m11s   10.104.9.165   4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-indexnode-856d96566d-xw9s9    1/1     Running    0               3m12s   10.104.5.126   4am-node12   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-proxy-d9494bcb7-7nfqp         1/1     Running    0               3m12s   10.104.6.243   4am-node13   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-querycoord-6dfcd65b4-8ndrc    1/1     Running    0               3m11s   10.104.9.163   4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-querynode-7c6bc59df8-5dwdr    1/1     Running    0               101s    10.104.9.166   4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-querynode-7c6bc59df8-m5s5d    1/1     Running    0               3m12s   10.104.6.245   4am-node13   <none>           <none>
fouram-tag-no-clean-c8d8n-1-milvus-rootcoord-757d97478b-xf6fp    1/1     Running    0               3m11s   10.104.9.164   4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-minio-0                              1/1     Running    0               11d     10.104.4.91    4am-node11   <none>           <none>
fouram-tag-no-clean-c8d8n-1-minio-1                              1/1     Running    0               11d     10.104.6.157   4am-node13   <none>           <none>
fouram-tag-no-clean-c8d8n-1-minio-2                              1/1     Running    0               11d     10.104.9.53    4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-minio-3                              1/1     Running    0               11d     10.104.5.188   4am-node12   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-bookie-0                      1/1     Running    0               11d     10.104.4.90    4am-node11   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-bookie-1                      1/1     Running    0               11d     10.104.9.54    4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-bookie-2                      1/1     Running    0               11d     10.104.1.161   4am-node10   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-broker-0                      1/1     Running    0               11d     10.104.9.43    4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-proxy-0                       1/1     Running    0               11d     10.104.4.84    4am-node11   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-recovery-0                    1/1     Running    0               11d     10.104.9.42    4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-zookeeper-0                   1/1     Running    0               11d     10.104.4.94    4am-node11   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-zookeeper-1                   1/1     Running    0               11d     10.104.9.56    4am-node14   <none>           <none>
fouram-tag-no-clean-c8d8n-1-pulsar-zookeeper-2                   1/1     Running    0               11d     10.104.6.159   4am-node13   <none>           <none>

client log:

[2022-11-21 16:04:14,522] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.515038', 'RPC error': '2022-11-21 16:04:14.522143'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,524] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:04:14.522015', 'RPC error': '2022-11-21 16:04:14.524766'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,526] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.522563', 'RPC error': '2022-11-21 16:04:14.526728'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,531] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.525847', 'RPC error': '2022-11-21 16:04:14.531271'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,531] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:04:14.528395', 'RPC error': '2022-11-21 16:04:14.531816'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,533] [   DEBUG] - [scene_test] Start scene test : scene_test_4708_617837 (milvus_benchmark.client:634)
[2022-11-21 16:04:14,642] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.532289', 'RPC error': '2022-11-21 16:04:14.642022'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,642] [    INFO] - Create collection: <scene_test_4708_617837> successfully (milvus_benchmark.client:158)
[2022-11-21 16:04:14,642] [   DEBUG] - Milvus create_collection run in 0.1093s (milvus_benchmark.client:57)
[2022-11-21 16:04:14,645] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:04:14.642713', 'RPC error': '2022-11-21 16:04:14.645702'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,648] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.646037', 'RPC error': '2022-11-21 16:04:14.648933'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,651] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.649267', 'RPC error': '2022-11-21 16:04:14.651890'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,654] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.652270', 'RPC error': '2022-11-21 16:04:14.654828'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,657] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.655137', 'RPC error': '2022-11-21 16:04:14.657482'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,660] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.658338', 'RPC error': '2022-11-21 16:04:14.660729'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,663] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.661056', 'RPC error': '2022-11-21 16:04:14.663440'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,666] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.663723', 'RPC error': '2022-11-21 16:04:14.666486'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,669] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:04:14.666960', 'RPC error': '2022-11-21 16:04:14.669359'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,672] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.669814', 'RPC error': '2022-11-21 16:04:14.672139'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,674] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.672362', 'RPC error': '2022-11-21 16:04:14.674467'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,676] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.674698', 'RPC error': '2022-11-21 16:04:14.676916'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,679] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:04:14.677100', 'RPC error': '2022-11-21 16:04:14.679048'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,682] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.679284', 'RPC error': '2022-11-21 16:04:14.682201'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,685] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:04:14.682457', 'RPC error': '2022-11-21 16:04:14.685112'}> (pymilvus.decorators:108)
[2022-11-21 16:04:14,685] [   DEBUG] - [scene_test] Start scene test : scene_test_1481_63168 (milvus_benchmark.client:634)
[2022-11-21 16:04:14,700] [    INFO] - Create collection: <scene_test_1481_63168> successfully (milvus_benchmark.client:158)

Expected Behavior

No response

Steps To Reproduce

1. create a collection
2. build ivf_sq8 index
3. insert 100m data
4. build index again
5. load collection
6. concurrent  search, load, query, scene_test ==> raise error

Milvus Log

Complete client log:

[Uploading main-logs.txt.zip…]()

Anything else?

client-random-locust-concurrent-replica2-search-100m-ddl

    locust_random_concurrent_performance:
      collections:
        -
          collection_name: sift_100m_128_l2
          # collection_name: sift_1m_128_l2
          other_fields: float1
          ni_per: 50000
          build_index: true
          index_type: ivf_sq8
          index_param:
            nlist: 2048
          load_param:
            replica_number: 2
          task:
            types:
              -
                type: query
                weight: 20
                params:
                  top_k: 10
                  nq: 10
                  search_param:
                    nprobe: 16
              -
                type: load
                weight: 1
                params:
                  replica_number: 2
              -
                type: get
                weight: 10
                params:
                  ids_length: 10
              -
                type: scene_test
                weight: 2
            connection_num: 1
            clients_num: 20
            spawn_rate: 2
            # during_time: 30m
            during_time: 12h
elstic commented 1 year ago

Milvus datanode crash and build hnsw index after search failed, raise an error"collection:sift_100m_128_l2 or partition:[] not loaded into memory when search"

Environment

- Milvus version:2.2.0-20221121-efa1cf7f 
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.0.dev72

Current Behavior

server-instance
fouram-tag-no-clean-p4l24-1
server-configmap
server-cluster-8c64m-querynode2
client-configmap
client-random-locust-100m-hnsw-ddl-r8-w2-60h-con

server:

fouram-tag-no-clean-p4l24-1-etcd-0                               1/1     Running     0               5m54s   10.104.9.169   4am-node14   <none>           <none>
fouram-tag-no-clean-p4l24-1-etcd-1                               1/1     Running     0               5m54s   10.104.6.251   4am-node13   <none>           <none>
fouram-tag-no-clean-p4l24-1-etcd-2                               1/1     Running     0               5m54s   10.104.5.139   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-datacoord-78c96cdbb5-vq454    1/1     Running     1 (113s ago)    5m54s   10.104.5.128   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-datanode-5d9bff85f8-dfbvl     1/1     Running     1 (113s ago)    5m54s   10.104.4.224   4am-node11   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-indexcoord-74bbc6b5f8-xfkm4   1/1     Running     1 (113s ago)    5m54s   10.104.4.225   4am-node11   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-indexnode-8b56f79ff-cnt6n     1/1     Running     0               5m54s   10.104.5.134   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-proxy-75df4f4864-mdkqr        1/1     Running     1 (112s ago)    5m54s   10.104.5.129   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-querycoord-78b6cc44c7-qhvct   1/1     Running     1 (112s ago)    5m54s   10.104.5.135   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-querynode-5f6786db4f-dmppg    1/1     Running     0               5m54s   10.104.6.247   4am-node13   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-querynode-5f6786db4f-jdxr2    1/1     Running     0               5m54s   10.104.5.127   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-milvus-rootcoord-5784b59f-cjdzs      1/1     Running     1 (2m22s ago)   5m54s   10.104.5.132   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-minio-0                              1/1     Running     0               5m54s   10.104.6.249   4am-node13   <none>           <none>
fouram-tag-no-clean-p4l24-1-minio-1                              1/1     Running     0               5m54s   10.104.9.167   4am-node14   <none>           <none>
fouram-tag-no-clean-p4l24-1-minio-2                              1/1     Running     0               5m54s   10.104.5.138   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-minio-3                              1/1     Running     0               5m54s   10.104.4.226   4am-node11   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-bookie-0                      1/1     Running     0               5m54s   10.104.6.248   4am-node13   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-bookie-1                      1/1     Running     0               5m54s   10.104.9.168   4am-node14   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-bookie-2                      1/1     Running     0               5m54s   10.104.1.242   4am-node10   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-bookie-init-6mtvd             0/1     Completed   0               5m54s   10.104.5.137   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-broker-0                      1/1     Running     0               5m54s   10.104.5.131   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-proxy-0                       1/1     Running     0               5m54s   10.104.4.223   4am-node11   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-pulsar-init-n7nxx             0/1     Completed   0               5m54s   10.104.5.136   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-recovery-0                    1/1     Running     0               5m54s   10.104.5.130   4am-node12   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-zookeeper-0                   1/1     Running     0               5m54s   10.104.6.250   4am-node13   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-zookeeper-1                   1/1     Running     0               5m18s   10.104.4.227   4am-node11   <none>           <none>
fouram-tag-no-clean-p4l24-1-pulsar-zookeeper-2                   1/1     Running     0               4m48s   10.104.9.170   4am-node14   <none>           <none>

client log:

[2022-11-21 16:09:11,276] [   DEBUG] - 0 users have been stopped, 2 still running (locust.runners:281)
[2022-11-21 16:09:11,282] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:11.276462', 'RPC error': '2022-11-21 16:09:11.282228'}> (pymilvus.decorators:108)
[2022-11-21 16:09:11,283] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:11.277090', 'RPC error': '2022-11-21 16:09:11.283167'}> (pymilvus.decorators:108)
[2022-11-21 16:09:11,285] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:11.283056', 'RPC error': '2022-11-21 16:09:11.285709'}> (pymilvus.decorators:108)
[2022-11-21 16:09:11,286] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:11.283569', 'RPC error': '2022-11-21 16:09:11.286583'}> (pymilvus.decorators:108)
[2022-11-21 16:09:11,289] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:11.286489', 'RPC error': '2022-11-21 16:09:11.289105'}> (pymilvus.decorators:108)
[2022-11-21 16:09:11,290] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:11.286901', 'RPC error': '2022-11-21 16:09:11.290052'}> (pymilvus.decorators:108)
[2022-11-21 16:09:11,322] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:11.289957', 'RPC error': '2022-11-21 16:09:11.322008'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,276] [   DEBUG] - Ramping to {"MyUser": 4} (4 total users) (locust.runners:341)
[2022-11-21 16:09:12,276] [   DEBUG] - Spawning additional {"MyUser": 2} ({"MyUser": 2} already running)... (locust.runners:206)
[2022-11-21 16:09:12,276] [   DEBUG] - 4 users spawned (locust.runners:220)
[2022-11-21 16:09:12,276] [   DEBUG] - All users of class MyUser spawned (locust.runners:221)
[2022-11-21 16:09:12,277] [   DEBUG] - 0 users have been stopped, 4 still running (locust.runners:281)
[2022-11-21 16:09:12,312] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:12.277762', 'RPC error': '2022-11-21 16:09:12.312278'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,315] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:12.312968', 'RPC error': '2022-11-21 16:09:12.315664'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,318] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:12.315962', 'RPC error': '2022-11-21 16:09:12.318737'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,321] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:12.318933', 'RPC error': '2022-11-21 16:09:12.321608'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,323] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:12.321790', 'RPC error': '2022-11-21 16:09:12.323695'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,325] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:12.323873', 'RPC error': '2022-11-21 16:09:12.325701'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,327] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:12.325879', 'RPC error': '2022-11-21 16:09:12.327646'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,330] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:12.327819', 'RPC error': '2022-11-21 16:09:12.330014'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,332] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:12.330492', 'RPC error': '2022-11-21 16:09:12.332711'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,334] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:12.332894', 'RPC error': '2022-11-21 16:09:12.334780'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,336] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-11-21 16:09:12.334951', 'RPC error': '2022-11-21 16:09:12.336823'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,339] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-11-21 16:09:12.336998', 'RPC error': '2022-11-21 16:09:12.338999'}> (pymilvus.decorators:108)
[2022-11-21 16:09:12,339] [   DEBUG] - [scene_test] Start scene test : scene_test_9613_331926 (milvus_benchmark.client:634)
[2022-11-21 16:09:12,362] [    INFO] - Create collection: <scene_test_9613_331926> successfully (milvus_benchmark.client:158)
[2022-11-21 16:09:12,362] [   DEBUG] - Milvus create_collection run in 0.0233s (milvus_benchmark.client:57)
[2022-11-21 16:09:13,277] [   DEBUG] - Ramping to {"MyUser": 6} (6 total users) (locust.runners:341)
[2022-11-21 16:09:13,278] [   DEBUG] - Spawning additional {"MyUser": 2} ({"MyUser": 4} already running)... (locust.runners:206)
[2022-11-21 16:09:13,278] [   DEBUG] - 6 users spawned (locust.runners:220)
[2022-11-21 16:09:13,278] [   DEBUG] - All users of class MyUser spawned (locust.runners:221)

Expected Behavior

No response

Steps To Reproduce

1. create a collection
2. build hnsw index
3. insert 100m data
4. build index again
5. load collection
6. concurrent  search, load, query, scene_test ==> raise error

Milvus Log

Complete client log:

Anything else?

pod status: image

Complete client log: main-logs2.txt.zip

client-random-locust-100m-hnsw-ddl-r8-w2-60h-con

    locust_random_concurrent_performance:
      collections:
        -
          collection_name: sift_100m_128_l2
          ni_per: 50000
          build_index: true
          index_type: hnsw
          index_param:
            M: 8
            efConstruction: 200
          task:
            types:
              -
                type: query
                weight: 8
                params:
                  top_k: 10
                  nq: 10
                  search_param:
                    ef: 16
              -
                type: load
                weight: 1
              -
                type: get
                weight: 8
                params:
                  ids_length: 10
              -
                type: scene_test
                weight: 2
            connection_num: 1
            clients_num: 20
            spawn_rate: 2
            # during_time: 1h
            during_time: 12h

datanode: image image

yanliang567 commented 1 year ago

/assign @jiaoew1991 /unassign

jiaoew1991 commented 1 year ago

/assign @weiliu1031 /unassign

weiliu1031 commented 1 year ago

Background: there is a collection sift_100m_128_l2, which will be automatically loaded when query coord restart。

Problems: after query coord restarts, it saw three query nodes here: A, B, C. and A for replica_1, B, C for replica_2. then when load segments in sift_100m_128_l2. query node A down and no query node available in replica_1, so loading will never success.

need fix:

  1. let this auto loading collection timeout as expected.

need improve:

  1. enable balance node between replicas.
weiliu1031 commented 1 year ago

more discuss:

  1. we should ensure collection loaded before restart, should also be loaded after collection, so recover collection progress should not timeout.
  2. for this case, we should prevent query node down and never came back. then with enough resource, all collection should be loaded successfully
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

yanliang567 commented 1 year ago

@elstic is this still an issue

elstic commented 1 year ago

@elstic is this still an issue

This issue did not occur in version 2.2.3, I will close this issue

elstic commented 1 year ago

/close

sre-ci-robot commented 1 year ago

@elstic: Closing this issue.

In response to [this](https://github.com/milvus-io/milvus/issues/20761#issuecomment-1425105871): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.