milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.05k stars 2.95k forks source link

[Bug]: [perf-nightly] Milvus concurrent query failed on sift-1m dataset with HNSW indexed #26282

Closed jingkl closed 1 year ago

jingkl commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.2.0-20230811-42cb5d12
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):    rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server:

perf-query-169128200-1-47-8113-etcd-0                             1/1     Running                  0                55m     10.104.6.100    4am-node13   <none>           <none>
perf-query-169128200-1-47-8113-etcd-1                             1/1     Running                  0                55m     10.104.4.66     4am-node11   <none>           <none>
perf-query-169128200-1-47-8113-etcd-2                             1/1     Running                  0                55m     10.104.14.172   4am-node18   <none>           <none>
perf-query-169128200-1-47-8113-milvus-datacoord-9cd5b597b-7m6mv   1/1     Running                  1 (51m ago)      55m     10.104.17.218   4am-node23   <none>           <none>
perf-query-169128200-1-47-8113-milvus-datanode-7f576774b7-mndsr   1/1     Running                  2 (48m ago)      55m     10.104.5.20     4am-node12   <none>           <none>
perf-query-169128200-1-47-8113-milvus-indexcoord-747964bccw7gfv   1/1     Running                  2 (47m ago)      55m     10.104.4.41     4am-node11   <none>           <none>
perf-query-169128200-1-47-8113-milvus-indexnode-7565796dbbp4822   1/1     Running                  1 (51m ago)      55m     10.104.24.229   4am-node29   <none>           <none>
perf-query-169128200-1-47-8113-milvus-proxy-66b5d659c9-xwmb5      1/1     Running                  2 (47m ago)      55m     10.104.16.236   4am-node21   <none>           <none>
perf-query-169128200-1-47-8113-milvus-querycoord-c45657f74pqsl6   1/1     Running                  2 (47m ago)      55m     10.104.16.237   4am-node21   <none>           <none>
perf-query-169128200-1-47-8113-milvus-querynode-98cc5cff6-954g6   1/1     Running                  1 (51m ago)      55m     10.104.1.228    4am-node10   <none>           <none>
perf-query-169128200-1-47-8113-milvus-rootcoord-84b7b99784t6wdx   1/1     Running                  3 (44m ago)      55m     10.104.16.238   4am-node21   <none>           <none>
perf-query-169128200-1-47-8113-minio-0                            1/1     Running                  0                55m     10.104.21.155   4am-node24   <none>           <none>
perf-query-169128200-1-47-8113-minio-1                            1/1     Running                  0                55m     10.104.6.102    4am-node13   <none>           <none>
perf-query-169128200-1-47-8113-minio-2                            1/1     Running                  0                55m     10.104.4.67     4am-node11   <none>           <none>
perf-query-169128200-1-47-8113-minio-3                            1/1     Running                  0                55m     10.104.13.50    4am-node16   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-bookie-0                    1/1     Running                  0                55m     10.104.13.48    4am-node16   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-bookie-1                    1/1     Running                  0                55m     10.104.14.168   4am-node18   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-bookie-2                    1/1     Running                  0                55m     10.104.4.69     4am-node11   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-bookie-init-l7znm           0/1     Completed                0                55m     10.104.4.39     4am-node11   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-broker-0                    1/1     Running                  0                55m     10.104.4.40     4am-node11   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-proxy-0                     1/1     Running                  0                55m     10.104.16.239   4am-node21   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-pulsar-init-r8h4c           0/1     Completed                0                55m     10.104.16.235   4am-node21   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-recovery-0                  1/1     Running                  0                55m     10.104.6.77     4am-node13   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-zookeeper-0                 1/1     Running                  0                55m     10.104.17.221   4am-node23   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-zookeeper-1                 1/1     Running                  0                52m     10.104.5.81     4am-node12   <none>           <none>
perf-query-169128200-1-47-8113-pulsar-zookeeper-2                 1/1     Running                  0                47m     10.104.9.52     4am-node14   <none>           <none> 

client log:

[2023-08-11 05:01:43,412 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2023-08-11 05:00:43.410682', 'RPC error': '2023-08-11 05:01:43.412020'}> (decorators.py:108)
[2023-08-11 05:01:43,414 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (api_request.py:53)
[2023-08-11 05:01:43,414 - ERROR - fouram]: [CheckFunc] query request check failed, response:<MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (func_check.py:46)
[2023-08-11 05:01:43,415 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2023-08-11 05:00:43.411783', 'RPC error': '2023-08-11 05:01:43.415127'}> (decorators.py:108)
[2023-08-11 05:01:43,415 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (api_request.py:53)
[2023-08-11 05:01:43,415 - ERROR - fouram]: [CheckFunc] query request check failed, response:<MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (func_check.py:46)
[2023-08-11 05:01:43,416 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2023-08-11 05:00:43.412965', 'RPC error': '2023-08-11 05:01:43.416145'}> (decorators.py:108)
[2023-08-11 05:01:43,416 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (api_request.py:53)
[2023-08-11 05:01:43,416 - ERROR - fouram]: [CheckFunc] query request check failed, response:<MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (func_check.py:46)
[2023-08-11 05:01:43,417 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2023-08-11 05:00:43.412377', 'RPC error': '2023-08-11 05:01:43.417051'}> (decorators.py:108)
[2023-08-11 05:01:43,417 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (api_request.py:53)
[2023-08-11 05:01:43,417 - ERROR - fouram]: [CheckFunc] query request check failed, response:<MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (func_check.py:46)
[2023-08-11 05:01:43,417 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2023-08-11 05:00:43.414073', 'RPC error': '2023-08-11 05:01:43.417960'}> (decorators.py:108)
[2023-08-11 05:01:43,418 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (api_request.py:53)
[2023-08-11 05:01:43,418 - ERROR - fouram]: [CheckFunc] query request check failed, response:<MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (func_check.py:46)
[2023-08-11 05:01:43,418 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2023-08-11 05:00:43.413541', 'RPC error': '2023-08-11 05:01:43.418873'}> (decorators.py:108)
[2023-08-11 05:01:43,419 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)> (api_request.py:53)

Expected Behavior

No response

Steps To Reproduce

1. create a collection 
        2. build index on vector column
        3. insert a certain number of vectors
        4. flush collection
        5. build index on vector column with the same parameters
        6. count the total number of rows
        7. load collection
        8. perform concurrent query-> raise error

Milvus Log

No response

Anything else?

client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'dataset_name': 'sift',
                                                    'dataset_size': '1m',
                                                    'ni_per': 50000},
                                 'load_params': {},
                                 'search_params': {},
                                 'resource_groups_params': {'reset': False},
                                 'index_params': {'index_type': 'HNSW',
                                                  'index_param': {'M': 8,
                                                                  'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 100,
                                                       'during_time': 600,
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'query',
                                                       'weight': 1,
                                                       'params': {'expr': 'id '
                                                                          'in '
                                                                          '[1, '
                                                                          '100, '
                                                                          '1000]'}}]}
yanliang567 commented 1 year ago

/assign @czs007 /unassign

czs007 commented 1 year ago

image image image

jingkl commented 1 year ago

test image:master-20231008-a7151653-amd64,The problem didn't reappear, close issue first.