milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.05k stars 2.95k forks source link

[Bug]: [benchmark][standalone] Concurrency is 100, Milvus search failed #23324

Closed elstic closed 1 year ago

elstic commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.2.0-20230410-58eb118a
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

case : test_concurrent_locust_ivf_sq8_search_standalone argo task : fouramf-concurrent-8svs2 , id : 2

server:

fouramf-concurrent-8svs2-2-etcd-0                                 1/1     Running            0              2m25s   10.104.6.100    4am-node13   <none>           <none>
fouramf-concurrent-8svs2-2-milvus-standalone-8487bdd875-j4dw8     1/1     Running            0              2m25s   10.104.5.213    4am-node12   <none>           <none>
fouramf-concurrent-8svs2-2-minio-db79b7fb9-f8lkg                  1/1     Running            0              2m25s   10.104.6.99     4am-node13   <none>           <none>

client log: test_concurrent_locust_ivf_sq8_search_standalone.zip

Expected Behavior

No response

Steps To Reproduce

1. create a collection or use an existing collection
        2. build index on vector column
        3. insert a certain number of vectors
        4. flush collection
        5. build index on vector column with the same parameters
        6. build index on on scalars column or not
        7. count the total number of rows
        8. load collection
        9. perform concurrent operations
        10. clean all collections or not

Milvus Log

No response

Anything else?

No response

yanliang567 commented 1 year ago

@elstic @jiaoew1991 is it a dup of #23331?

/assign @jiaoew1991 /unassign

elstic commented 1 year ago

@elstic @jiaoew1991 is it a dup of #23331?

They have similarities and report similar errors. But there are also differences, #23331 is cluster and FLAT, this issue is standalone and the index is IVF_SQ8

jiaoew1991 commented 1 year ago

/assign @yah01 /unassign

yah01 commented 1 year ago

@elstic could you help check whether the problem still exists, as #23331 has fixed?

elstic commented 1 year ago

@elstic could you help check whether the problem still exists, as #23331 has fixed?

This issue still exists. I briefly describe my behavior:

  1. initialize milvus instance with milvus resource limit of 6c6g
  2. build index and load collection after inserting 1million data
  3. perform a concurrent search with 100 concurrency.

I rerun the case . client test result:

{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_6c6m',
            'config': {'standalone': {'resources': {'limits': {'cpu': '6.0',
                                                               'memory': '6Gi'},
                                                    'requests': {'cpu': '4.0',
                                                                 'memory': '4Gi'}},
                                      'persistence': {'persistentVolumeClaim': {'storageClass': 'local-path'}}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1,
                                'global': {'storageClass': 'local-path'},
                                'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone',
                                 'persistence': {'storageClass': 'local-path'},
                                 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.2.0-20230801-53a23fc8'}}},
            'host': 'fouramf-qgcgz-15-6294-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_ivf_sq8_search_standalone',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 1000000,
                                                    'ni_per': 50000},
                                 'collection_params': {'other_fields': [],
                                                       'shards_num': 2},
                                 'load_params': {},
                                 'query_params': {},
                                 'search_params': {},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_SQ8',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 100,
                                                       'during_time': 1800,
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 10000,
                                                                  'top_k': 10,
                                                                  'search_param': {'nprobe': 16},
                                                                  'expr': None,
                                                                  'guarantee_timestamp': None,
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'timeout': 60,
                                                                  'random_data': True}}]},
            'run_id': 2023080270407991,
            'datetime': '2023-08-02 09:04:00.312036',
            'client_version': '2.2'},
 'result': {'test_result': {'index': {'RT': 28.7186},
                            'insert': {'total_time': 38.689,
                                       'VPS': 25847.14,
                                       'batch_time': 1.9344,
                                       'batch': 50000},
                            'flush': {'RT': 4.5285},
                            'load': {'RT': 3.5325},
                            'Locust': {'Aggregated': {'Requests': 1360,
                                                      'Fails': 1276,
                                                      'RPS': 0.75,
                                                      'fail_s': 0.94,
                                                      'RT_max': 130865.29,
                                                      'RT_avg': 116947.87,
                                                      'TP50': 118000.0,
                                                      'TP99': 129000.0},
                                       'search': {'Requests': 1360,
                                                  'Fails': 1276,
                                                  'RPS': 0.75,
                                                  'fail_s': 0.94,
                                                  'RT_max': 130865.29,
                                                  'RT_avg': 116947.87,
                                                  'TP50': 118000.0,
                                                  'TP99': 129000.0}}}}}

server:

fouramf-qgcgz-15-6294-etcd-0                                      1/1     Running     0                 33m     10.104.1.102    4am-node10   <none>           <none>
fouramf-qgcgz-15-6294-milvus-standalone-7c74ff88bc-vwb7c          1/1     Running     0                 33m     10.104.20.120   4am-node22   <none>           <none>
fouramf-qgcgz-15-6294-minio-866978d8f7-tvlwr                      1/1     Running     0                 33m     10.104.20.119   4am-node22   <none>           <none>

client error log (at 100 concurrency, search almost always fails): image

cpu and memory usgae: image image

image: 2.2.0-20230801-53a23fc8 It's almost the same as this case: (https://github.com/milvus-io/milvus/issues/25396), both are concurrent searches, except that it's a cluster.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.