Open wangting0128 opened 6 months ago
/assign @zhagnlu /unassign
test case name:test_concurrent_locust_25m_multi_hnsw_ddl_dql_dml_cluster image:2.4-20240412-9613d368-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fouramf-multi-vr-4n9z2-52-7648-etcd-0 1/1 Running 0 2d14h 10.104.17.59 4am-node23 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-etcd-1 1/1 Running 0 2d14h 10.104.30.146 4am-node38 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-etcd-2 1/1 Running 0 2d14h 10.104.34.195 4am-node37 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-datacoord-868d6c4b754c9vw 1/1 Running 0 2d14h 10.104.6.166 4am-node13 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-datanode-789d5cc66-4v2tw 1/1 Running 1 (2d14h ago) 2d14h 10.104.5.230 4am-node12 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-datanode-789d5cc66-g6m8f 1/1 Running 0 2d14h 10.104.6.167 4am-node13 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-indexcoord-9b95b694-bdbwc 1/1 Running 0 2d14h 10.104.17.53 4am-node23 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-indexnode-785d96769d46dsp 1/1 Running 0 2d14h 10.104.1.225 4am-node10 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-indexnode-785d96769d9wz4t 1/1 Running 0 2d14h 10.104.24.118 4am-node29 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-indexnode-785d96769dfsqdt 1/1 Running 0 2d14h 10.104.5.231 4am-node12 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-indexnode-785d96769dn6dzs 1/1 Running 0 2d14h 10.104.15.115 4am-node20 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-proxy-9f7654d86-p9fff 1/1 Running 1 (2d14h ago) 2d14h 10.104.4.69 4am-node11 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-querycoord-64cf66cc4tdwjv 1/1 Running 1 (2d14h ago) 2d14h 10.104.17.54 4am-node23 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-querynode-6799986d875h9p4 1/1 Running 0 2d14h 10.104.23.249 4am-node27 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-querynode-6799986d876cbvm 1/1 Running 0 2d14h 10.104.20.250 4am-node22 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-querynode-6799986d8776cbq 1/1 Running 0 2d14h 10.104.13.228 4am-node16 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-querynode-6799986d87krkwh 1/1 Running 0 2d14h 10.104.29.110 4am-node35 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-querynode-6799986d87l72cq 1/1 Running 0 2d14h 10.104.31.174 4am-node34 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-querynode-6799986d87tq256 1/1 Running 0 2d14h 10.104.4.70 4am-node11 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-milvus-rootcoord-7f58c6856788586 1/1 Running 1 (2d14h ago) 2d14h 10.104.34.190 4am-node37 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-minio-0 1/1 Running 0 2d14h 10.104.34.193 4am-node37 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-minio-1 1/1 Running 0 2d14h 10.104.33.119 4am-node36 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-minio-2 1/1 Running 0 2d14h 10.104.27.30 4am-node31 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-minio-3 1/1 Running 0 2d14h 10.104.19.134 4am-node28 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-bookie-0 1/1 Running 0 2d14h 10.104.17.60 4am-node23 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-bookie-1 1/1 Running 0 2d14h 10.104.30.147 4am-node38 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-bookie-2 1/1 Running 0 2d14h 10.104.27.33 4am-node31 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-bookie-init-6b7px 0/1 Completed 0 2d14h 10.104.21.103 4am-node24 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-broker-0 1/1 Running 0 2d14h 10.104.26.27 4am-node32 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-proxy-0 1/1 Running 0 2d14h 10.104.19.132 4am-node28 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-pulsar-init-p9n4x 0/1 Completed 0 2d14h 10.104.26.28 4am-node32 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-recovery-0 1/1 Running 0 2d14h 10.104.21.104 4am-node24 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-zookeeper-0 1/1 Running 0 2d14h 10.104.33.118 4am-node36 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-zookeeper-1 1/1 Running 0 2d14h 10.104.20.254 4am-node22 <none> <none>
fouramf-multi-vr-4n9z2-52-7648-pulsar-zookeeper-2 1/1 Running 0 2d14h 10.104.34.201 4am-node37 <none> <none>
memory of dataCoord, rootCoord, proxy increased during test
CPU of dataCoord, rootCoord, proxy increased during test
client pod name: fouramf-multi-vector-4n9z2-121786545 client log: client.log.zip
test result:
[2024-04-14 23:44:23,363 - INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-04-14 23:44:23,364 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-04-14 23:44:23,364 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-14 23:44:23,364 - INFO - fouram]: grpc delete 24100 0(0.00%) | 34 2 8026 7 | 0.11 0.00 (stats.py:789)
[2024-04-14 23:44:23,364 - INFO - fouram]: grpc hybrid_search 478007 0(0.00%) | 36 8 149367 17 | 2.21 0.00 (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: grpc insert 23881 0(0.00%) | 607 12 106914 28 | 0.11 0.00 (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: grpc load 24052 40(0.17%) | 941 5 30003 56 | 0.11 0.00 (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: grpc query 238556 0(0.00%) | 33 3 40121 7 | 1.10 0.00 (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: grpc scene_hybrid_search_test 48175 18(0.04%) | 330567 8253 2342025 289000 | 0.22 0.00 (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: grpc scene_test 47785 17(0.04%) | 115598 7590 1519290 108000 | 0.22 0.00 (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: grpc search 476990 0(0.00%) | 38 10 135415 20 | 2.21 0.00 (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: Aggregated 1361546 75(0.01%) | 15813 2 2342025 18 | 6.31 0.00 (stats.py:789)
[2024-04-14 23:44:23,365 - INFO - fouram]: (stats.py:790)
[2024-04-14 23:44:23,370 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_2c2m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '16',
'memory': '32Gi'},
'requests': {'cpu': '8',
'memory': '16Gi'}},
'replicas': 6},
'indexNode': {'resources': {'limits': {'cpu': '6.0',
'memory': '4Gi'},
'requests': {'cpu': '4.0',
'memory': '3Gi'}},
'replicas': 4},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '2Gi'},
'requests': {'cpu': '2.0',
'memory': '2Gi'}},
'replicas': 2},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240412-9613d368-amd64'}}},
'host': 'fouramf-multi-vr-4n9z2-52-7648-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_25m_multi_hnsw_ddl_dql_dml_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {}},
'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'}},
'scalars_params': {'float_vector_1': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img',
'dim': 200}},
'float_vector_2': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}},
'float_vector_3': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img',
'dim': 200}}},
'dataset_name': 'sift',
'dataset_size': 25000000,
'ni_per': 10000},
'collection_params': {'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'float_1'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 100,
'during_time': '60h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 1,
'timeout': 600,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 25000000}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 1,
'timeout': 30}},
{'type': 'search',
'weight': 20,
'params': {'nq': 10,
'top_k': 10,
'search_param': {'ef': 32},
'expr': {'float_1': {'GT': -1.0,
'LT': 12500000.0}},
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': ['float_1',
'float_vector_1'],
'ignore_growing': False,
'group_by_field': None,
'timeout': 600,
'random_data': True}},
{'type': 'query',
'weight': 10,
'params': {'ids': None,
'expr': {'float_1': {'GT': 0,
'LT': 100}},
'output_fields': None,
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 600,
'random_data': False,
'random_count': 0,
'random_range': [0,
1],
'field_name': 'id',
'field_type': 'int64'}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 30}},
{'type': 'scene_test',
'weight': 2,
'params': {'dim': 128,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': [],
'scalars_params': {},
'scalars_index': {},
'vectors_index': {}}},
{'type': 'hybrid_search',
'weight': 20,
'params': {'nq': 1,
'top_k': 10,
'reqs': [{'search_param': {'ef': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'ef': 64},
'anns_field': 'float_vector_1',
'top_k': 10},
{'search_param': {'ef': 256},
'anns_field': 'float_vector_2',
'top_k': 200},
{'search_param': {'ef': 64},
'anns_field': 'float_vector_3',
'top_k': 30}],
'rerank': {'WeightedRanker': [0.85,
0.95,
0.5,
0.5]},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True}},
{'type': 'scene_hybrid_search_test',
'weight': 2,
'params': {'nq': 1,
'top_k': 1,
'reqs': [{'search_param': {'nprobe': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'nprobe': 32},
'anns_field': 'float_vector_1',
'top_k': 10},
{'search_param': {'ef': 32},
'anns_field': 'float_vector_2',
'top_k': 5},
{'search_param': {'search_list': 20},
'anns_field': 'float_vector_3',
'top_k': 10}],
'rerank': {'RRFRanker': []},
'output_fields': None,
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True,
'dataset': 'local',
'dim': 128,
'shards_num': 2,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int64_1',
'bool_1',
'varchar_1'],
'replica_number': 1,
'scalars_params': {'float_vector_1': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}},
'float_vector_2': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}},
'float_vector_3': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}}},
'scalars_index': {'int64_1': {},
'bool_1': {'index_type': 'INVERTED'},
'varchar_1': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'IP'}},
'prepare_before_insert': False,
'hybrid_search_counts': 10,
'new_connect': False,
'new_user': False}}]},
'run_id': 2024041248952987,
'datetime': '2024-04-12 09:41:35.152868',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 983.6586,
'float_vector_1': {'RT': 452.5967},
'float_vector_2': {'RT': 391.3642},
'float_vector_3': {'RT': 1.0187},
'id': {'RT': 0.5197}},
'insert': {'total_time': 3630.6284,
'VPS': 6885.8603,
'batch_time': 1.4523,
'batch': 10000},
'flush': {'RT': 2.526},
'load': {'RT': 22.1774},
'Locust': {'Aggregated': {'Requests': 1361546,
'Fails': 75,
'RPS': 6.31,
'fail_s': 0.0,
'RT_max': 2342025.36,
'RT_avg': 15813.42,
'TP50': 18,
'TP99': 379000.0},
'delete': {'Requests': 24100,
'Fails': 0,
'RPS': 0.11,
'fail_s': 0.0,
'RT_max': 8026.62,
'RT_avg': 34.12,
'TP50': 7,
'TP99': 540.0},
'hybrid_search': {'Requests': 478007,
'Fails': 0,
'RPS': 2.21,
'fail_s': 0.0,
'RT_max': 149367.0,
'RT_avg': 36.11,
'TP50': 17,
'TP99': 510.0},
'insert': {'Requests': 23881,
'Fails': 0,
'RPS': 0.11,
'fail_s': 0.0,
'RT_max': 106914.11,
'RT_avg': 607.16,
'TP50': 28,
'TP99': 8400.0},
'load': {'Requests': 24052,
'Fails': 40,
'RPS': 0.11,
'fail_s': 0.0,
'RT_max': 30003.68,
'RT_avg': 941.78,
'TP50': 56,
'TP99': 9800.0},
'query': {'Requests': 238556,
'Fails': 0,
'RPS': 1.1,
'fail_s': 0.0,
'RT_max': 40121.62,
'RT_avg': 34.0,
'TP50': 7,
'TP99': 520.0},
'scene_hybrid_search_test': {'Requests': 48175,
'Fails': 18,
'RPS': 0.22,
'fail_s': 0.0,
'RT_max': 2342025.36,
'RT_avg': 330567.44,
'TP50': 289000.0,
'TP99': 916000.0},
'scene_test': {'Requests': 47785,
'Fails': 17,
'RPS': 0.22,
'fail_s': 0.0,
'RT_max': 1519290.3,
'RT_avg': 115598.71,
'TP50': 108000.0,
'TP99': 240000.0},
'search': {'Requests': 476990,
'Fails': 0,
'RPS': 2.21,
'fail_s': 0.0,
'RT_max': 135415.52,
'RT_avg': 38.53,
'TP50': 20,
'TP99': 520.0}}}}}
/assign @longjiquan
please help on it
this issue include several problems: first: insert occassionly failed, such as two bugs:
not show errmsg correctly, the reason, datacoord AssignSegment not return err correctly,
datacoord load collection from rootcoord maybe timeout, for this, need add retry mechanism
@wangting0128 do we have any run with pyroscope enabled?
@wangting0128 do we have any run with pyroscope enabled?
Yes, let us communicate offline
argo task: fouramf-wxjlk-3985588210 test case name: test_concurrent_locust_25m_multi_hnsw_ddl_dql_dml_cluster image: 2.4-20240614-fd1c7b1a-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-25m-etcd-0 1/1 Running 0 2d22h 10.104.34.185 4am-node37 <none> <none>
multi-vector-25m-etcd-1 1/1 Running 0 2d22h 10.104.18.122 4am-node25 <none> <none>
multi-vector-25m-etcd-2 1/1 Running 0 2d22h 10.104.26.6 4am-node32 <none> <none>
multi-vector-25m-milvus-datacoord-5c8484f95c-5bzpc 1/1 Running 3 (2d22h ago) 2d22h 10.104.26.2 4am-node32 <none> <none>
multi-vector-25m-milvus-datanode-5b6c98ddfb-njttz 1/1 Running 3 (2d22h ago) 2d22h 10.104.20.87 4am-node22 <none> <none>
multi-vector-25m-milvus-datanode-5b6c98ddfb-qvmsx 1/1 Running 3 (2d22h ago) 2d22h 10.104.13.14 4am-node16 <none> <none>
multi-vector-25m-milvus-indexcoord-bc59d4984-6wwss 1/1 Running 0 2d22h 10.104.13.15 4am-node16 <none> <none>
multi-vector-25m-milvus-indexnode-7cdf4458dd-jhh6x 1/1 Running 3 (2d22h ago) 2d22h 10.104.30.35 4am-node38 <none> <none>
multi-vector-25m-milvus-indexnode-7cdf4458dd-prwdl 1/1 Running 3 (2d22h ago) 2d22h 10.104.6.178 4am-node13 <none> <none>
multi-vector-25m-milvus-indexnode-7cdf4458dd-x5dx8 1/1 Running 3 (2d22h ago) 2d22h 10.104.26.254 4am-node32 <none> <none>
multi-vector-25m-milvus-indexnode-7cdf4458dd-znghg 1/1 Running 3 (2d22h ago) 2d22h 10.104.17.168 4am-node23 <none> <none>
multi-vector-25m-milvus-proxy-6d49558fdd-fk8t6 1/1 Running 3 (2d22h ago) 2d22h 10.104.13.17 4am-node16 <none> <none>
multi-vector-25m-milvus-querycoord-64f44b5fc9-blzl7 1/1 Running 3 (2d22h ago) 2d22h 10.104.20.88 4am-node22 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-4zwvh 1/1 Running 3 (2d22h ago) 2d22h 10.104.20.89 4am-node22 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-cd59f 1/1 Running 3 (2d22h ago) 2d22h 10.104.34.182 4am-node37 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-jfqfc 1/1 Running 3 (2d22h ago) 2d22h 10.104.18.115 4am-node25 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-kr27h 1/1 Running 3 (2d22h ago) 2d22h 10.104.5.186 4am-node12 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-t2h2k 1/1 Running 3 (2d22h ago) 2d22h 10.104.13.18 4am-node16 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-t84cd 1/1 Running 2 (2d22h ago) 2d22h 10.104.4.84 4am-node11 <none> <none>
multi-vector-25m-milvus-rootcoord-6fc6c69b9c-hsct6 1/1 Running 3 (2d22h ago) 2d22h 10.104.26.253 4am-node32 <none> <none>
multi-vector-25m-minio-0 1/1 Running 0 2d22h 10.104.25.181 4am-node30 <none> <none>
multi-vector-25m-minio-1 1/1 Running 0 2d22h 10.104.26.7 4am-node32 <none> <none>
multi-vector-25m-minio-2 1/1 Running 0 2d22h 10.104.16.160 4am-node21 <none> <none>
multi-vector-25m-minio-3 1/1 Running 0 2d22h 10.104.30.37 4am-node38 <none> <none>
multi-vector-25m-pulsar-bookie-0 1/1 Running 0 2d22h 10.104.25.180 4am-node30 <none> <none>
multi-vector-25m-pulsar-bookie-1 1/1 Running 0 2d22h 10.104.18.121 4am-node25 <none> <none>
multi-vector-25m-pulsar-bookie-2 1/1 Running 0 2d22h 10.104.16.161 4am-node21 <none> <none>
multi-vector-25m-pulsar-bookie-init-dc2cm 0/1 Completed 0 2d22h 10.104.13.19 4am-node16 <none> <none>
multi-vector-25m-pulsar-broker-0 1/1 Running 0 2d22h 10.104.17.167 4am-node23 <none> <none>
multi-vector-25m-pulsar-proxy-0 1/1 Running 0 2d22h 10.104.14.20 4am-node18 <none> <none>
multi-vector-25m-pulsar-pulsar-init-gfh6v 0/1 Completed 0 2d22h 10.104.25.175 4am-node30 <none> <none>
multi-vector-25m-pulsar-recovery-0 1/1 Running 0 2d22h 10.104.34.181 4am-node37 <none> <none>
multi-vector-25m-pulsar-zookeeper-0 1/1 Running 0 2d22h 10.104.18.120 4am-node25 <none> <none>
multi-vector-25m-pulsar-zookeeper-1 1/1 Running 0 2d22h 10.104.25.183 4am-node30 <none> <none>
multi-vector-25m-pulsar-zookeeper-2 1/1 Running 0 2d22h 10.104.16.165 4am-node21 <none> <none>
client pod name: fouramf-wxjlk-3985588210 client logs: load timeout 30s
[2024-06-14 21:37:15,491 - ERROR - fouram]: grpc RpcError: [load_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-06-14 21:36:45.481683', 'gRPC error': '2024-06-14 21:37:15.491306'}> (decorators.py:150)
[2024-06-14 21:37:15,498 - ERROR - fouram]: (api_response) : [Collection.load] <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2024-06-14T21:37:15.483385029+00:00"}"
>, [requestId: 32a3896c-2a96-11ef-9c53-3ec6fc29743c] (api_request.py:57)
[2024-06-14 21:37:15,499 - ERROR - fouram]: [CheckFunc] load request check failed, response:<_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2024-06-14T21:37:15.483385029+00:00"}"
> (func_check.py:48)
[2024-06-14 21:37:36,127 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-06-14 21:37:36,127 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-14 21:37:36,127 - INFO - fouram]: grpc delete 10250 0(0.00%) | 67 2 7682 10 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:37:36,127 - INFO - fouram]: grpc hybrid_search 206240 0(0.00%) | 53 8 7374 21 | 1.10 0.00 (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: grpc insert 10385 0(0.00%) | 96 4 7615 19 | 0.10 0.00 (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: grpc load 10549 1(0.01%) | 986 6 30011 410 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: grpc query 103092 0(0.00%) | 58 3 8325 11 | 0.30 0.00 (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: grpc scene_hybrid_search_test 20574 0(0.00%) | 108112 10817 561526 99000 | 0.20 0.00 (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: grpc scene_test 20344 0(0.00%) | 80166 63290 243972 74000 | 0.30 0.00 (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: grpc search 206008 0(0.00%) | 60 12 7305 27 | 0.80 0.00 (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: Aggregated 587442 1(0.00%) | 6633 2 561526 24 | 2.80 0.00 (stats.py:789)
[2024-06-14 21:37:36,128 - INFO - fouram]: (stats.py:790)
[2024-06-14 21:39:53,940 - ERROR - fouram]: grpc RpcError: [load_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-06-14 21:39:23.938380', 'gRPC error': '2024-06-14 21:39:53.940185'}> (decorators.py:150)
[2024-06-14 21:39:53,942 - ERROR - fouram]: (api_response) : [Collection.load] <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2024-06-14T21:39:53.939026209+00:00"}"
>, [requestId: 91161ad2-2a96-11ef-9c53-3ec6fc29743c] (api_request.py:57)
[2024-06-14 21:39:53,942 - ERROR - fouram]: [CheckFunc] load request check failed, response:<_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2024-06-14T21:39:53.939026209+00:00"}"
> (func_check.py:48)
[2024-06-14 21:39:56,407 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-06-14 21:39:56,407 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-14 21:39:56,408 - INFO - fouram]: grpc delete 10254 0(0.00%) | 67 2 7682 10 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,408 - INFO - fouram]: grpc hybrid_search 206313 0(0.00%) | 53 8 7374 21 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,408 - INFO - fouram]: grpc insert 10389 0(0.00%) | 96 4 7615 19 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,408 - INFO - fouram]: grpc load 10553 2(0.02%) | 992 6 30011 410 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,412 - INFO - fouram]: grpc query 103133 0(0.00%) | 58 3 8325 11 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,412 - INFO - fouram]: grpc scene_hybrid_search_test 20583 0(0.00%) | 108280 10817 567065 99000 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,412 - INFO - fouram]: grpc scene_test 20357 0(0.00%) | 80250 63290 248739 74000 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,412 - INFO - fouram]: grpc search 206075 0(0.00%) | 60 12 7305 27 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,412 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-14 21:39:56,412 - INFO - fouram]: Aggregated 587657 2(0.00%) | 6643 2 567065 24 | 0.00 0.00 (stats.py:789)
[2024-06-14 21:39:56,412 - INFO - fouram]: (stats.py:790)
test result:
[2024-06-14 22:43:42,594 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-06-14 22:43:42,594 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc delete 11091 0(0.00%) | 67 2 7682 10 | 0.26 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc hybrid_search 224244 0(0.00%) | 53 8 7374 22 | 5.19 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc insert 11329 0(0.00%) | 97 4 7615 20 | 0.26 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc load 11454 2(0.02%) | 1008 6 30011 420 | 0.27 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc query 112006 0(0.00%) | 57 3 8325 11 | 2.59 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc scene_hybrid_search_test 22384 0(0.00%) | 110576 10817 695817 99000 | 0.52 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc scene_test 22206 0(0.00%) | 80556 63290 292848 74000 | 0.51 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc search 224084 0(0.00%) | 60 12 7305 28 | 5.19 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: Aggregated 638798 2(0.00%) | 6746 2 695817 25 | 14.79 0.00 (stats.py:789)
[2024-06-14 22:43:42,596 - INFO - fouram]: (stats.py:790)
[2024-06-14 22:43:42,603 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_2c2m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '16',
'memory': '32Gi'},
'requests': {'cpu': '8',
'memory': '16Gi'}},
'replicas': 6},
'indexNode': {'resources': {'limits': {'cpu': '6.0',
'memory': '4Gi'},
'requests': {'cpu': '4.0',
'memory': '3Gi'}},
'replicas': 4},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '2Gi'},
'requests': {'cpu': '2.0',
'memory': '2Gi'}},
'replicas': 2},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240614-fd1c7b1a-amd64'}}},
'host': 'multi-vector-25m-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_25m_multi_hnsw_ddl_dql_dml_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {}},
'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'}},
'scalars_params': {'float_vector_1': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img'}},
'float_vector_2': {'params': {'dim': 128},
'other_params': {'dataset': 'sift'}},
'float_vector_3': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img'}}},
'dataset_name': 'sift',
'dataset_size': 25000000,
'ni_per': 10000},
'collection_params': {'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'float_1'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 100,
'during_time': '12h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 1,
'timeout': 600,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 25000000}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 1,
'timeout': 30}},
{'type': 'search',
'weight': 20,
'params': {'nq': 10,
'top_k': 10,
'search_param': {'ef': 32},
'expr': {'float_1': {'GT': -1.0,
'LT': 12500000.0}},
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': ['float_1',
'float_vector_1'],
'ignore_growing': False,
'group_by_field': None,
'timeout': 600,
'random_data': True}},
{'type': 'query',
'weight': 10,
'params': {'ids': None,
'expr': {'float_1': {'GT': 0,
'LT': 100}},
'output_fields': None,
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 600,
'random_data': False,
'random_count': 0,
'random_range': [0,
1],
'field_name': 'id',
'field_type': 'int64'}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 30}},
{'type': 'scene_test',
'weight': 2,
'params': {'dim': 128,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': [],
'scalars_params': {},
'scalars_index': {},
'vectors_index': {}}},
{'type': 'hybrid_search',
'weight': 20,
'params': {'nq': 1,
'top_k': 10,
'reqs': [{'search_param': {'ef': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'ef': 64},
'anns_field': 'float_vector_1',
'top_k': 10},
{'search_param': {'ef': 256},
'anns_field': 'float_vector_2',
'top_k': 200},
{'search_param': {'ef': 64},
'anns_field': 'float_vector_3',
'top_k': 30}],
'rerank': {'WeightedRanker': [0.85,
0.95,
0.5,
0.5]},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True}},
{'type': 'scene_hybrid_search_test',
'weight': 2,
'params': {'nq': 1,
'top_k': 1,
'reqs': [{'search_param': {'nprobe': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'nprobe': 32},
'anns_field': 'float_vector_1',
'top_k': 10},
{'search_param': {'ef': 32},
'anns_field': 'float_vector_2',
'top_k': 5},
{'search_param': {'search_list': 20},
'anns_field': 'float_vector_3',
'top_k': 10}],
'rerank': {'RRFRanker': []},
'output_fields': None,
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True,
'dataset': 'local',
'dim': 128,
'shards_num': 2,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int64_1',
'bool_1',
'varchar_1'],
'replica_number': 1,
'scalars_params': {'float_vector_1': {'params': {'dim': 128},
'other_params': {'dataset': 'sift'}},
'float_vector_2': {'params': {'dim': 128},
'other_params': {'dataset': 'sift'}},
'float_vector_3': {'params': {'dim': 128},
'other_params': {'dataset': 'sift'}}},
'scalars_index': {'int64_1': {},
'bool_1': {'index_type': 'INVERTED'},
'varchar_1': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'IP'}},
'prepare_before_insert': False,
'hybrid_search_counts': 10,
'new_connect': False,
'new_user': False}}]},
'run_id': 2024061455048265,
'datetime': '2024-06-14 08:58:24.563525',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 797.7971,
'float_vector_1': {'RT': 314.0947},
'float_vector_2': {'RT': 160.9543},
'float_vector_3': {'RT': 56.7346},
'id': {'RT': 0.5175}},
'insert': {'total_time': 4500.9088,
'VPS': 5554.4338,
'batch_time': 1.8004,
'batch': 10000},
'flush': {'RT': 2.5759},
'load': {'RT': 26.6893},
'Locust': {'Aggregated': {'Requests': 638798,
'Fails': 2,
'RPS': 14.79,
'fail_s': 0.0,
'RT_max': 695817.7,
'RT_avg': 6746.23,
'TP50': 25,
'TP99': 142000.0},
'delete': {'Requests': 11091,
'Fails': 0,
'RPS': 0.26,
'fail_s': 0.0,
'RT_max': 7682.49,
'RT_avg': 67.31,
'TP50': 10,
'TP99': 1300.0},
'hybrid_search': {'Requests': 224244,
'Fails': 0,
'RPS': 5.19,
'fail_s': 0.0,
'RT_max': 7374.99,
'RT_avg': 53.59,
'TP50': 22,
'TP99': 780.0},
'insert': {'Requests': 11329,
'Fails': 0,
'RPS': 0.26,
'fail_s': 0.0,
'RT_max': 7615.48,
'RT_avg': 97.93,
'TP50': 20,
'TP99': 1600.0},
'load': {'Requests': 11454,
'Fails': 2,
'RPS': 0.27,
'fail_s': 0.0,
'RT_max': 30011.03,
'RT_avg': 1008.53,
'TP50': 420.0,
'TP99': 8400.0},
'query': {'Requests': 112006,
'Fails': 0,
'RPS': 2.59,
'fail_s': 0.0,
'RT_max': 8325.88,
'RT_avg': 57.44,
'TP50': 11,
'TP99': 1000.0},
'scene_hybrid_search_test': {'Requests': 22384,
'Fails': 0,
'RPS': 0.52,
'fail_s': 0.0,
'RT_max': 695817.7,
'RT_avg': 110576.9,
'TP50': 99000.0,
'TP99': 356000.0},
'scene_test': {'Requests': 22206,
'Fails': 0,
'RPS': 0.51,
'fail_s': 0.0,
'RT_max': 292848.61,
'RT_avg': 80556.57,
'TP50': 74000.0,
'TP99': 153000.0},
'search': {'Requests': 224084,
'Fails': 0,
'RPS': 5.19,
'fail_s': 0.0,
'RT_max': 7305.05,
'RT_avg': 60.82,
'TP50': 28,
'TP99': 850.0}}}}}
Is there an existing issue for this?
Environment
Current Behavior
argo task: fouramf-multi-vector-dn4zz test case name: test_concurrent_locust_25m_multi_hnsw_ddl_dql_dml_cluster
server:
queryNode CPU usage
queryNode memory usage
proxy CPU usage
proxy memory usage
client pod name: fouramf-multi-vector-dn4zz-3641650604 client log: test_concurrent_locust_25m_multi_hnsw_ddl_dql_dml_cluster.zip
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
test result: