milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.43k stars 2.74k forks source link

[Bug]: [benchmark][cluster] Serial insert failed `fail to produce insert msg` #32491

Open wangting0128 opened 3 months ago

wangting0128 commented 3 months ago

Is there an existing issue for this?

Environment

- Milvus version: 2.4-20240418-238f9a4a-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc66
- OS(Ubuntu or CentOS):  
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: inverted-corn-1713628800 test case: test_inverted_locust_varchar_dql_cluster

server:

NAME                                                              READY   STATUS                            RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-corn-128800-4-27-4946-etcd-0                             1/1     Running                           0               3h13m   10.104.28.125   4am-node33   <none>           <none>
inverted-corn-128800-4-27-4946-etcd-1                             1/1     Running                           0               3h13m   10.104.18.177   4am-node25   <none>           <none>
inverted-corn-128800-4-27-4946-etcd-2                             1/1     Running                           0               3h13m   10.104.24.90    4am-node29   <none>           <none>
inverted-corn-128800-4-27-4946-milvus-datacoord-84dd8dcc4-mnrs5   1/1     Running                           0               3h13m   10.104.15.159   4am-node20   <none>           <none>
inverted-corn-128800-4-27-4946-milvus-datanode-cd956f86b-mh99w    1/1     Running                           1 (3h8m ago)    3h13m   10.104.13.240   4am-node16   <none>           <none>
inverted-corn-128800-4-27-4946-milvus-indexcoord-76b9764b675b2g   1/1     Running                           0               3h13m   10.104.15.160   4am-node20   <none>           <none>
inverted-corn-128800-4-27-4946-milvus-indexnode-7897c8ffd89hcxc   1/1     Running                           0               3h13m   10.104.16.188   4am-node21   <none>           <none>
inverted-corn-128800-4-27-4946-milvus-proxy-5fffd5b4b9-4slcd      1/1     Running                           1 (3h8m ago)    3h13m   10.104.15.163   4am-node20   <none>           <none>
inverted-corn-128800-4-27-4946-milvus-querycoord-5c55bff4cmp89m   1/1     Running                           1 (3h8m ago)    3h13m   10.104.15.158   4am-node20   <none>           <none>
inverted-corn-128800-4-27-4946-milvus-querynode-86d8d86775fq2vd   1/1     Running                           0               3h13m   10.104.15.161   4am-node20   <none>           <none>
inverted-corn-128800-4-27-4946-milvus-rootcoord-86c7f6cbdbwmqpk   1/1     Running                           1 (3h8m ago)    3h13m   10.104.15.156   4am-node20   <none>           <none>
inverted-corn-128800-4-27-4946-minio-0                            1/1     Running                           0               3h13m   10.104.28.123   4am-node33   <none>           <none>
inverted-corn-128800-4-27-4946-minio-1                            1/1     Running                           0               3h13m   10.104.30.149   4am-node38   <none>           <none>
inverted-corn-128800-4-27-4946-minio-2                            1/1     Running                           0               3h13m   10.104.18.178   4am-node25   <none>           <none>
inverted-corn-128800-4-27-4946-minio-3                            1/1     Running                           0               3h13m   10.104.24.87    4am-node29   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-bookie-0                    1/1     Running                           0               3h13m   10.104.26.84    4am-node32   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-bookie-1                    1/1     Running                           0               3h13m   10.104.30.150   4am-node38   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-bookie-2                    1/1     Running                           0               3h13m   10.104.28.129   4am-node33   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-bookie-init-xbzt6           0/1     Completed                         0               3h13m   10.104.1.191    4am-node10   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-broker-0                    1/1     Running                           0               3h13m   10.104.1.192    4am-node10   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-proxy-0                     1/1     Running                           0               3h13m   10.104.5.195    4am-node12   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-pulsar-init-rw88b           0/1     Completed                         0               3h13m   10.104.5.197    4am-node12   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-recovery-0                  1/1     Running                           0               3h13m   10.104.9.214    4am-node14   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-zookeeper-0                 1/1     Running                           0               3h13m   10.104.28.119   4am-node33   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-zookeeper-1                 1/1     Running                           0               3h12m   10.104.32.61    4am-node39   <none>           <none>
inverted-corn-128800-4-27-4946-pulsar-zookeeper-2                 1/1     Running                           0               3h10m   10.104.34.57    4am-node37   <none>           <none> 

image

client pod name: inverted-corn-1713628800-658679405 client log: image

Expected Behavior

No response

Steps To Reproduce

:test steps:
            1. create collection with fields: 'float_vector','varchar_1','varchar_2','varchar_3'
            2. build indexes:
                IVF_FLAT: 'float_vector'
                INVERTED: 'varchar_1', 'varchar_2', 'varchar_3'
            3. insert 300k data <- insert failed

Milvus Log

No response

Anything else?

No response

xiaofan-luan commented 3 months ago

seems not be a milvus issue. /assign @LoveEachDay

wangting0128 commented 3 months ago

different scene,same error

argo task:multi-vector-scene-mix-gkx8n test case:test_hybrid_search_locust_ddl_dql_cluster image:2.4-20240418-238f9a4a-amd64

server:

NAME                                                              READY   STATUS                            RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
multi-vector-scene-mix-gkx8n-5-etcd-0                             1/1     Running                           0                12h     10.104.30.184   4am-node38   <none>           <none>
multi-vector-scene-mix-gkx8n-5-etcd-1                             1/1     Running                           0                12h     10.104.27.204   4am-node31   <none>           <none>
multi-vector-scene-mix-gkx8n-5-etcd-2                             1/1     Running                           0                12h     10.104.21.150   4am-node24   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-datacoord-d46685c55-8wk5k   1/1     Running                           0                12h     10.104.1.220    4am-node10   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-datanode-5746c47f4b-xnwnp   1/1     Running                           1 (12h ago)      12h     10.104.1.221    4am-node10   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-indexcoord-569969dd997ndm   1/1     Running                           0                12h     10.104.13.24    4am-node16   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-indexnode-bb8db9685-6b95x   1/1     Running                           0                12h     10.104.25.42    4am-node30   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-indexnode-bb8db9685-cn9k9   1/1     Running                           0                12h     10.104.15.211   4am-node20   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-indexnode-bb8db9685-gwxhf   1/1     Running                           1 (12h ago)      12h     10.104.13.26    4am-node16   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-indexnode-bb8db9685-wgvvw   1/1     Running                           1 (12h ago)      12h     10.104.19.198   4am-node28   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-proxy-54bcfd846b-9lp5s      1/1     Running                           1 (12h ago)      12h     10.104.13.25    4am-node16   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-querycoord-86d64884dh5264   1/1     Running                           1 (12h ago)      12h     10.104.28.17    4am-node33   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-querynode-797ff87699hh7c5   1/1     Running                           0                12h     10.104.20.61    4am-node22   <none>           <none>
multi-vector-scene-mix-gkx8n-5-milvus-rootcoord-7c495586cfnsq4v   1/1     Running                           1 (12h ago)      12h     10.104.19.199   4am-node28   <none>           <none>
multi-vector-scene-mix-gkx8n-5-minio-0                            1/1     Running                           0                12h     10.104.27.202   4am-node31   <none>           <none>
multi-vector-scene-mix-gkx8n-5-minio-1                            1/1     Running                           0                12h     10.104.21.146   4am-node24   <none>           <none>
multi-vector-scene-mix-gkx8n-5-minio-2                            1/1     Running                           0                12h     10.104.31.33    4am-node34   <none>           <none>
multi-vector-scene-mix-gkx8n-5-minio-3                            1/1     Running                           0                12h     10.104.30.186   4am-node38   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-bookie-0                    1/1     Running                           0                12h     10.104.27.207   4am-node31   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-bookie-1                    1/1     Running                           0                12h     10.104.21.151   4am-node24   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-bookie-2                    1/1     Running                           0                12h     10.104.28.40    4am-node33   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-bookie-init-5c49g           0/1     Completed                         0                12h     10.104.28.22    4am-node33   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-broker-0                    1/1     Running                           0                12h     10.104.28.24    4am-node33   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-proxy-0                     1/1     Running                           0                12h     10.104.29.168   4am-node35   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-pulsar-init-cv4l5           0/1     Completed                         0                12h     10.104.19.197   4am-node28   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-recovery-0                  1/1     Running                           0                12h     10.104.34.63    4am-node37   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-zookeeper-0                 1/1     Running                           0                12h     10.104.26.9     4am-node32   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-zookeeper-1                 1/1     Running                           0                12h     10.104.24.147   4am-node29   <none>           <none>
multi-vector-scene-mix-gkx8n-5-pulsar-zookeeper-2                 1/1     Running                           0                12h     10.104.31.46    4am-node34   <none>           <none> 

client pod name: multi-vector-scene-mix-gkx8n-847155013 client log:

[2024-04-21 23:26:04,742 - ERROR - fouram]: RPC error: [batch_insert], <MilvusException: (code=65535, message=message send timeout: TimeoutError)>, <Time:{'RPC start': '2024-04-21 23:25:34.646015', 'RPC error': '2024-04-21 23:26:04.742837'}> (decorators.py:146)
[2024-04-21 23:26:04,744 - ERROR - fouram]: (api_response) : [Collection.insert] <MilvusException: (code=65535, message=message send timeout: TimeoutError)>, [requestId: 740277b8-0036-11ef-acd3-5e8c9c28c793] (api_request.py:57)
[2024-04-21 23:26:04,744 - ERROR - fouram]: [CheckFunc] insert request check failed, response:<MilvusException: (code=65535, message=message send timeout: TimeoutError)> (func_check.py:54)

test step:

        concurrent test and calculation of RT and QPS

        :purpose:  `DDL & DQL`
            verify DDL & DQL scenario,
            which has 4 vector fields(IVF_FLAT, HNSW, DISKANN, IVF_SQ8) and scalar fields: `int64_1`, `varchar_1`

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim,
                'float_vector_1': 128dim,
                'float_vector_2': 128dim,
                'float_vector_3': 128dim,
                scalar field: int64_1, varchar_1
            2. build indexes:
                IVF_FLAT: 'float_vector'
                HNSW: 'float_vector_1',
                DISKANN: 'float_vector_2'
                IVF_SQ8: 'float_vector_3'
                INVERTED: 'int64_1', 'varchar_1'
                default scalar index: 'id'
            3. insert 1 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
                replica: 1
            7. concurrent request:
                - scene_test
                    (collection: create->insert->flush->index->drop)
                - search
                - hybrid_search
                - query

test result:

[2024-04-22 06:10:22,621 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-22 06:10:22,622 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-04-22 06:10:22,622 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-22 06:10:22,622 -  INFO - fouram]: grpc     hybrid_search                                                                  10162     0(0.00%) |   5574     121   49153   2900 |    0.24        0.00 (stats.py:789)
[2024-04-22 06:10:22,622 -  INFO - fouram]: grpc     query                                                                           9910     0(0.00%) |   2814      71   66499    990 |    0.23        0.00 (stats.py:789)
[2024-04-22 06:10:22,622 -  INFO - fouram]: grpc     scene_test                                                                      9935     1(0.01%) |  65698   31185  175974  64000 |    0.23        0.00 (stats.py:789)
[2024-04-22 06:10:22,622 -  INFO - fouram]: grpc     search                                                                          9988     0(0.00%) |  12588    1943   54490  13000 |    0.23        0.00 (stats.py:789)
[2024-04-22 06:10:22,623 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-22 06:10:22,623 -  INFO - fouram]:          Aggregated                                                                     39995     1(0.00%) |  21577      71  175974   8800 |    0.93        0.00 (stats.py:789)
[2024-04-22 06:10:22,623 -  INFO - fouram]:  (stats.py:790)
[2024-04-22 06:10:22,628 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'cluster',
            'config_name': 'cluster_2c8m',
            'config': {'queryNode': {'resources': {'limits': {'cpu': '8.0',
                                                              'memory': '32Gi'},
                                                   'requests': {'cpu': '5.0',
                                                                'memory': '17Gi'}},
                                     'replicas': 1},
                       'indexNode': {'resources': {'limits': {'cpu': '8.0',
                                                              'memory': '8Gi'},
                                                   'requests': {'cpu': '5.0',
                                                                'memory': '5Gi'}},
                                     'replicas': 4},
                       'dataNode': {'resources': {'limits': {'cpu': '2.0',
                                                             'memory': '8Gi'},
                                                  'requests': {'cpu': '2.0',
                                                               'memory': '5Gi'}}},
                       'cluster': {'enabled': True},
                       'pulsar': {},
                       'kafka': {},
                       'minio': {'metrics': {'podMonitor': {'enabled': True}}},
                       'etcd': {'metrics': {'enabled': True,
                                            'podMonitor': {'enabled': True}}},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
                                         'tag': '2.4-20240418-238f9a4a-amd64'}}},
            'host': 'multi-vector-scene-mix-gkx8n-5-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_hybrid_search_locust_ddl_dql_cluster',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_index': {'id': {},
                                                                      'int64_1': {'index_type': 'INVERTED'},
                                                                      'varchar_1': {'index_type': 'INVERTED'}},
                                                    'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
                                                                                         'index_param': {'M': 8,
                                                                                                         'efConstruction': 200},
                                                                                         'metric_type': 'L2'},
                                                                      'float_vector_2': {'index_type': 'DISKANN',
                                                                                         'index_param': {},
                                                                                         'metric_type': 'IP'},
                                                                      'float_vector_3': {'index_type': 'IVF_SQ8',
                                                                                         'index_param': {'nlist': 2048},
                                                                                         'metric_type': 'L2'}},
                                                    'scalars_params': {'float_vector_1': {'params': {'dim': 128},
                                                                                          'other_params': {'dataset': 'sift',
                                                                                                           'dim': 128}},
                                                                       'float_vector_2': {'params': {'dim': 128},
                                                                                          'other_params': {'dataset': 'sift',
                                                                                                           'dim': 128}},
                                                                       'float_vector_3': {'params': {'dim': 128},
                                                                                          'other_params': {'dataset': 'sift',
                                                                                                           'dim': 128}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': 1000000,
                                                    'ni_per': 10000},
                                 'collection_params': {'other_fields': ['float_vector_1',
                                                                        'float_vector_2',
                                                                        'float_vector_3',
                                                                        'int64_1',
                                                                        'varchar_1'],
                                                       'shards_num': 2},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False,
                                                          'reset_db': False},
                                 'index_params': {'index_type': 'IVF_FLAT',
                                                  'index_param': {'nlist': 1024}},
                                 'concurrent_params': {'concurrent_number': 20,
                                                       'during_time': '12h',
                                                       'interval': 20,
                                                       'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'scene_test',
                                                       'weight': 1,
                                                       'params': {'dim': 128,
                                                                  'data_size': 3000,
                                                                  'nb': 3000,
                                                                  'index_type': 'IVF_SQ8',
                                                                  'index_param': {'nlist': 2048},
                                                                  'metric_type': 'L2',
                                                                  'other_fields': [],
                                                                  'scalars_params': {},
                                                                  'scalars_index': {},
                                                                  'vectors_index': {}}},
                                                      {'type': 'search',
                                                       'weight': 1,
                                                       'params': {'nq': 1000,
                                                                  'top_k': 1,
                                                                  'search_param': {'nprobe': 1000},
                                                                  'expr': 'int64_1 '
                                                                          '>= '
                                                                          '0',
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'output_fields': None,
                                                                  'ignore_growing': False,
                                                                  'group_by_field': None,
                                                                  'timeout': 600,
                                                                  'random_data': True}},
                                                      {'type': 'hybrid_search',
                                                       'weight': 1,
                                                       'params': {'nq': 1,
                                                                  'top_k': 100,
                                                                  'reqs': [{'search_param': {'nprobe': 128},
                                                                            'anns_field': 'float_vector',
                                                                            'expr': 'int64_1 '
                                                                                    '> '
                                                                                    '100000',
                                                                            'top_k': 100},
                                                                           {'search_param': {'ef': 64},
                                                                            'anns_field': 'float_vector_1',
                                                                            'expr': 'id '
                                                                                    '< '
                                                                                    '900000',
                                                                            'top_k': 10},
                                                                           {'search_param': {'search_list': 32},
                                                                            'anns_field': 'float_vector_2',
                                                                            'expr': 'varchar_1 '
                                                                                    '> '
                                                                                    '"1"',
                                                                            'top_k': 30},
                                                                           {'search_param': {'nprobe': 16},
                                                                            'anns_field': 'float_vector_3',
                                                                            'top_k': 400}],
                                                                  'rerank': {'WeightedRanker': [0.85,
                                                                                                0.95,
                                                                                                0.51,
                                                                                                0.32]},
                                                                  'output_fields': ['*'],
                                                                  'ignore_growing': False,
                                                                  'guarantee_timestamp': None,
                                                                  'partition_names': None,
                                                                  'timeout': 600,
                                                                  'random_data': True}},
                                                      {'type': 'query',
                                                       'weight': 1,
                                                       'params': {'ids': None,
                                                                  'expr': 'int64_1 '
                                                                          '> '
                                                                          '-1 '
                                                                          '&& ',
                                                                  'output_fields': ['*'],
                                                                  'offset': None,
                                                                  'limit': None,
                                                                  'ignore_growing': False,
                                                                  'partition_names': None,
                                                                  'timeout': 600,
                                                                  'random_data': True,
                                                                  'random_count': 20,
                                                                  'random_range': [0,
                                                                                   100000],
                                                                  'field_name': 'id',
                                                                  'field_type': 'int64'}}]},
            'run_id': 2024042120597657,
            'datetime': '2024-04-21 17:54:19.821585',
            'client_version': '2.4.0'},
 'result': {'test_result': {'index': {'RT': 106.0339,
                                      'float_vector_1': {'RT': 67.1676},
                                      'float_vector_2': {'RT': 135.8117},
                                      'float_vector_3': {'RT': 0.5274},
                                      'id': {'RT': 0.565},
                                      'int64_1': {'RT': 0.5292},
                                      'varchar_1': {'RT': 0.5217}},
                            'insert': {'total_time': 117.3401,
                                       'VPS': 8522.2358,
                                       'batch_time': 1.1734,
                                       'batch': 10000},
                            'flush': {'RT': 2.5548},
                            'load': {'RT': 8.5671},
                            'Locust': {'Aggregated': {'Requests': 39995,
                                                      'Fails': 1,
                                                      'RPS': 0.93,
                                                      'fail_s': 0.0,
                                                      'RT_max': 175974.93,
                                                      'RT_avg': 21577.61,
                                                      'TP50': 8800.0,
                                                      'TP99': 77000.0},
                                       'hybrid_search': {'Requests': 10162,
                                                         'Fails': 0,
                                                         'RPS': 0.24,
                                                         'fail_s': 0.0,
                                                         'RT_max': 49153.77,
                                                         'RT_avg': 5574.35,
                                                         'TP50': 2900.0,
                                                         'TP99': 30000.0},
                                       'query': {'Requests': 9910,
                                                 'Fails': 0,
                                                 'RPS': 0.23,
                                                 'fail_s': 0.0,
                                                 'RT_max': 66499.88,
                                                 'RT_avg': 2814.78,
                                                 'TP50': 990.0,
                                                 'TP99': 19000.0},
                                       'scene_test': {'Requests': 9935,
                                                      'Fails': 1,
                                                      'RPS': 0.23,
                                                      'fail_s': 0.0,
                                                      'RT_max': 175974.93,
                                                      'RT_avg': 65698.88,
                                                      'TP50': 64000.0,
                                                      'TP99': 95000.0},
                                       'search': {'Requests': 9988,
                                                  'Fails': 0,
                                                  'RPS': 0.23,
                                                  'fail_s': 0.0,
                                                  'RT_max': 54490.35,
                                                  'RT_avg': 12588.83,
                                                  'TP50': 13000.0,
                                                  'TP99': 32000.0}}}}}