Open wangting0128 opened 5 months ago
it looks obviously that the scene_hybrid_search_test is getting more and more slow. @wangting0128 one more question, can we tell the screenshots above, such as create collection, drop collection are in scene_hybrid_search_test or scene_test?
/unassign
it looks obviously that the scene_hybrid_search_test is getting more and more slow. @wangting0128 one more question, can we tell the screenshots above, such as create collection, drop collection are in scene_hybrid_search_test or scene_test?
/unassign
CreateCollection and DropCollection include scene_test and scene_hybrid_search_test
I have initially checked with @czs007 . It is caused by too many collection metrics contained in rootCoord.
might be due to our snapshot gc issue.
all the meta takes 24 hours to garbage collected
@wangting0128 any updates for milvus 2.4 latest build? I believe rootcoord might not be the bottleneck any more with 10K+ collections.
@wangting0128 any updates for milvus 2.4 latest build? I believe rootcoord might not be the bottleneck any more with 10K+ collections.
There is no update. I will test again using the latest master branch image today
@wangting0128 any updates for milvus 2.4 latest build? I believe rootcoord might not be the bottleneck any more with 10K+ collections.
There is no update. I will test again using the latest master branch image today
Verified with image: master-20240520-555df49d-amd64, the problem of request RT rising seems to have been alleviated.
@wangting0128 any updates for milvus 2.4 latest build? I believe rootcoord might not be the bottleneck any more with 10K+ collections.
The 2.4 branch seems to still have this problem
argo task:fouramf-pxv6r-release image:2.4-20240520-2f260cd3-amd64 test case name:test_concurrent_locust_50m_multi_ivf_sq8_ddl_dql_standalone
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
fouramf-pxv6r-release-86-8408-etcd-0 1/1 Running 0 23h 10.104.24.203 4am-node29 <none> <none>
fouramf-pxv6r-release-86-8408-milvus-standalone-c49d84dcb-ttpm6 1/1 Running 3 (23h ago) 23h 10.104.30.176 4am-node38 <none> <none>
fouramf-pxv6r-release-86-8408-minio-79dd5dc784-27hlh 1/1 Running 0 23h 10.104.21.152 4am-node24 <none> <none>
client pod name: fouramf-pxv6r-release-3102168818 client monitor:
test result:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_32c128m',
'config': {'standalone': {'resources': {'limits': {'cpu': '32.0',
'memory': '128Gi'},
'requests': {'cpu': '17.0',
'memory': '65Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240520-2f260cd3-amd64'}}},
'host': 'fouramf-pxv6r-release-86-8408-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_50m_multi_ivf_sq8_ddl_dql_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {}},
'vectors_index': {'float_vector_1': {'index_type': 'IVF_SQ8',
'index_param': {'nlist': 1024},
'metric_type': 'L2'}},
'scalars_params': {'float_vector_1': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img',
'dim': 200}}},
'dataset_name': 'sift',
'dataset_size': 50000000,
'ni_per': 25000},
'collection_params': {'other_fields': ['float_vector_1'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '12h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'search',
'weight': 20,
'params': {'nq': 10,
'top_k': 10,
'search_param': {'nprobe': 16},
'expr': None,
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': None,
'ignore_growing': False,
'group_by_field': None,
'timeout': 60,
'random_data': True}},
{'type': 'query',
'weight': 10,
'params': {'ids': None,
'expr': ' '
'110 '
'> '
'id '
'> '
'100',
'output_fields': None,
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 60,
'random_data': False,
'random_count': 0,
'random_range': [0,
1],
'field_name': 'id',
'field_type': 'int64'}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 30}},
{'type': 'scene_test',
'weight': 2,
'params': {'dim': 128,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': [],
'scalars_params': {},
'scalars_index': {},
'vectors_index': {}}},
{'type': 'hybrid_search',
'weight': 20,
'params': {'nq': 1,
'top_k': 10,
'reqs': [{'search_param': {'nprobe': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'nprobe': 64},
'anns_field': 'float_vector_1',
'top_k': 10}],
'rerank': {'WeightedRanker': [0.85,
0.95]},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True}},
{'type': 'scene_hybrid_search_test',
'weight': 1,
'params': {'nq': 1,
'top_k': 1,
'reqs': [{'search_param': {'nprobe': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'nprobe': 32},
'anns_field': 'float_vector_1',
'top_k': 10},
{'search_param': {'ef': 32},
'anns_field': 'float_vector_2',
'top_k': 5},
{'search_param': {'search_list': 20},
'anns_field': 'float_vector_3',
'top_k': 10}],
'rerank': {'RRFRanker': []},
'output_fields': None,
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True,
'dataset': 'local',
'dim': 128,
'shards_num': 2,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int64_1',
'bool_1',
'varchar_1'],
'replica_number': 1,
'scalars_params': {'float_vector_1': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}},
'float_vector_2': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}},
'float_vector_3': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}}},
'scalars_index': {'int64_1': {},
'bool_1': {'index_type': 'INVERTED'},
'varchar_1': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'IP'}},
'prepare_before_insert': False,
'hybrid_search_counts': 10,
'new_connect': False,
'new_user': False}}]},
'run_id': 2024052129701132,
'datetime': '2024-05-21 03:42:50.076883',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 7631.7959,
'float_vector_1': {'RT': 6649.1025},
'id': {'RT': 5278.8701}},
'insert': {'total_time': 3443.8363,
'VPS': 14518.6924,
'batch_time': 1.7219,
'batch': 25000},
'flush': {'RT': 2.5153},
'load': {'RT': 127.4589},
'Locust': {'Aggregated': {'Requests': 202200,
'Fails': 0,
'RPS': 4.68,
'fail_s': 0.0,
'RT_max': 370200.46,
'RT_avg': 4266.24,
'TP50': 16,
'TP99': 84000.0},
'hybrid_search': {'Requests': 74794,
'Fails': 0,
'RPS': 1.73,
'fail_s': 0.0,
'RT_max': 9580.78,
'RT_avg': 46.94,
'TP50': 40,
'TP99': 130.0},
'load': {'Requests': 3728,
'Fails': 0,
'RPS': 0.09,
'fail_s': 0.0,
'RT_max': 3456.41,
'RT_avg': 9.11,
'TP50': 5,
'TP99': 41},
'query': {'Requests': 37679,
'Fails': 0,
'RPS': 0.87,
'fail_s': 0.0,
'RT_max': 8792.82,
'RT_avg': 11.66,
'TP50': 9,
'TP99': 31},
'scene_hybrid_search_test': {'Requests': 3647,
'Fails': 0,
'RPS': 0.08,
'fail_s': 0.0,
'RT_max': 364687.24,
'RT_avg': 85282.19,
'TP50': 85000.0,
'TP99': 142000.0},
'scene_test': {'Requests': 7420,
'Fails': 0,
'RPS': 0.17,
'fail_s': 0.0,
'RT_max': 370200.46,
'RT_avg': 73652.97,
'TP50': 73000.0,
'TP99': 85000.0},
'search': {'Requests': 74932,
'Fails': 0,
'RPS': 1.73,
'fail_s': 0.0,
'RT_max': 7573.87,
'RT_avg': 14.96,
'TP50': 11,
'TP99': 40}}}}}
some of the data might leaked in coordinator, causing the latency goes up.
@shaoting-huang please help on investigating it.
The earliest version(2.4-20240412-9613d368-amd64) in the issue uses datacoord channel manager v1, which is based on etcd, resulting in the increment of the DDL RT.
Comparing to version 2.4-20240520-2f260cd3-amd64 and version master-20240520-555df49d-amd64, these two versions use datacoord channel manager v2, which is based on rpc. Therefore the DDL RT is alleviated. I do not see any delay with version 2.4-20240520-2f260cd3-amd64.
argo task:fouramf-mltbl test case name: test_concurrent_locust_50m_multi_ivf_sq8_ddl_dql_standalone image: 2.4-20240603-b9b76ee9-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-50m-etcd-0 1/1 Running 0 23h 10.104.20.46 4am-node22 <none> <none>
multi-vector-50m-milvus-standalone-7f98cbd8c4-c2xfj 1/1 Running 3 (23h ago) 23h 10.104.18.93 4am-node25 <none> <none>
multi-vector-50m-minio-6f49c4c4f-gx9x4 1/1 Running 0 23h 10.104.18.94 4am-node25 <none> <none>
The upward trend of RT has eased compared to before, but it still shows an upward trend overall.
client pod name: fouramf-mltbl-1357292759 client monitor:
test result:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'standalone',
'config_name': 'standalone_32c128m',
'config': {'standalone': {'resources': {'limits': {'cpu': '32.0',
'memory': '128Gi'},
'requests': {'cpu': '17.0',
'memory': '65Gi'}}},
'cluster': {'enabled': False},
'etcd': {'replicaCount': 1,
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'minio': {'mode': 'standalone',
'metrics': {'podMonitor': {'enabled': True}}},
'pulsar': {'enabled': False},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240603-b9b76ee9-amd64'}}},
'host': 'multi-vector-50m-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_50m_multi_ivf_sq8_ddl_dql_standalone',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {}},
'vectors_index': {'float_vector_1': {'index_type': 'IVF_SQ8',
'index_param': {'nlist': 1024},
'metric_type': 'L2'}},
'scalars_params': {'float_vector_1': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img',
'dim': 200}}},
'dataset_name': 'sift',
'dataset_size': 50000000,
'ni_per': 25000},
'collection_params': {'other_fields': ['float_vector_1'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '12h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'search',
'weight': 20,
'params': {'nq': 10,
'top_k': 10,
'search_param': {'nprobe': 16},
'expr': None,
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': None,
'ignore_growing': False,
'group_by_field': None,
'timeout': 60,
'random_data': True}},
{'type': 'query',
'weight': 10,
'params': {'ids': None,
'expr': ' '
'110 '
'> '
'id '
'> '
'100',
'output_fields': None,
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 60,
'random_data': False,
'random_count': 0,
'random_range': [0,
1],
'field_name': 'id',
'field_type': 'int64'}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 30}},
{'type': 'scene_test',
'weight': 2,
'params': {'dim': 128,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': [],
'scalars_params': {},
'scalars_index': {},
'vectors_index': {}}},
{'type': 'hybrid_search',
'weight': 20,
'params': {'nq': 1,
'top_k': 10,
'reqs': [{'search_param': {'nprobe': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'nprobe': 64},
'anns_field': 'float_vector_1',
'top_k': 10}],
'rerank': {'WeightedRanker': [0.85,
0.95]},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True}},
{'type': 'scene_hybrid_search_test',
'weight': 1,
'params': {'nq': 1,
'top_k': 1,
'reqs': [{'search_param': {'nprobe': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'nprobe': 32},
'anns_field': 'float_vector_1',
'top_k': 10},
{'search_param': {'ef': 32},
'anns_field': 'float_vector_2',
'top_k': 5},
{'search_param': {'search_list': 20},
'anns_field': 'float_vector_3',
'top_k': 10}],
'rerank': {'RRFRanker': []},
'output_fields': None,
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True,
'dataset': 'local',
'dim': 128,
'shards_num': 2,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int64_1',
'bool_1',
'varchar_1'],
'replica_number': 1,
'scalars_params': {'float_vector_1': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}},
'float_vector_2': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}},
'float_vector_3': {'params': {'dim': 128},
'other_params': {'dataset': 'sift',
'dim': 128}}},
'scalars_index': {'int64_1': {},
'bool_1': {'index_type': 'INVERTED'},
'varchar_1': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'IP'}},
'prepare_before_insert': False,
'hybrid_search_counts': 10,
'new_connect': False,
'new_user': False}}]},
'run_id': 2024060351195443,
'datetime': '2024-06-03 03:25:19.174182',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 7154.5431,
'float_vector_1': {'RT': 5665.9816},
'id': {'RT': 3861.1322}},
'insert': {'total_time': 5397.7927,
'VPS': 9263.0456,
'batch_time': 2.6989,
'batch': 25000},
'flush': {'RT': 2.6004},
'load': {'RT': 14.6429},
'Locust': {'Aggregated': {'Requests': 209960,
'Fails': 0,
'RPS': 4.86,
'fail_s': 0.0,
'RT_max': 273524.73,
'RT_avg': 4107.62,
'TP50': 18,
'TP99': 79000.0},
'hybrid_search': {'Requests': 77802,
'Fails': 0,
'RPS': 1.8,
'fail_s': 0.0,
'RT_max': 3209.65,
'RT_avg': 50.46,
'TP50': 40,
'TP99': 230.0},
'load': {'Requests': 3855,
'Fails': 0,
'RPS': 0.09,
'fail_s': 0.0,
'RT_max': 2480.95,
'RT_avg': 10.48,
'TP50': 5,
'TP99': 47},
'query': {'Requests': 38875,
'Fails': 0,
'RPS': 0.9,
'fail_s': 0.0,
'RT_max': 1530.05,
'RT_avg': 12.13,
'TP50': 9,
'TP99': 40},
'scene_hybrid_search_test': {'Requests': 3862,
'Fails': 0,
'RPS': 0.09,
'fail_s': 0.0,
'RT_max': 273524.73,
'RT_avg': 75621.12,
'TP50': 75000.0,
'TP99': 141000.0},
'scene_test': {'Requests': 7821,
'Fails': 0,
'RPS': 0.18,
'fail_s': 0.0,
'RT_max': 247933.05,
'RT_avg': 72216.35,
'TP50': 72000.0,
'TP99': 83000.0},
'search': {'Requests': 77745,
'Fails': 0,
'RPS': 1.8,
'fail_s': 0.0,
'RT_max': 1509.37,
'RT_avg': 14.75,
'TP50': 11,
'TP99': 47}}}}}
argo task:fouramf-wxjlk test case name:test_concurrent_locust_25m_multi_hnsw_ddl_dql_dml_cluster image:2.4-20240614-fd1c7b1a-amd64
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-vector-25m-etcd-0 1/1 Running 0 2d22h 10.104.34.185 4am-node37 <none> <none>
multi-vector-25m-etcd-1 1/1 Running 0 2d22h 10.104.18.122 4am-node25 <none> <none>
multi-vector-25m-etcd-2 1/1 Running 0 2d22h 10.104.26.6 4am-node32 <none> <none>
multi-vector-25m-milvus-datacoord-5c8484f95c-5bzpc 1/1 Running 3 (2d22h ago) 2d22h 10.104.26.2 4am-node32 <none> <none>
multi-vector-25m-milvus-datanode-5b6c98ddfb-njttz 1/1 Running 3 (2d22h ago) 2d22h 10.104.20.87 4am-node22 <none> <none>
multi-vector-25m-milvus-datanode-5b6c98ddfb-qvmsx 1/1 Running 3 (2d22h ago) 2d22h 10.104.13.14 4am-node16 <none> <none>
multi-vector-25m-milvus-indexcoord-bc59d4984-6wwss 1/1 Running 0 2d22h 10.104.13.15 4am-node16 <none> <none>
multi-vector-25m-milvus-indexnode-7cdf4458dd-jhh6x 1/1 Running 3 (2d22h ago) 2d22h 10.104.30.35 4am-node38 <none> <none>
multi-vector-25m-milvus-indexnode-7cdf4458dd-prwdl 1/1 Running 3 (2d22h ago) 2d22h 10.104.6.178 4am-node13 <none> <none>
multi-vector-25m-milvus-indexnode-7cdf4458dd-x5dx8 1/1 Running 3 (2d22h ago) 2d22h 10.104.26.254 4am-node32 <none> <none>
multi-vector-25m-milvus-indexnode-7cdf4458dd-znghg 1/1 Running 3 (2d22h ago) 2d22h 10.104.17.168 4am-node23 <none> <none>
multi-vector-25m-milvus-proxy-6d49558fdd-fk8t6 1/1 Running 3 (2d22h ago) 2d22h 10.104.13.17 4am-node16 <none> <none>
multi-vector-25m-milvus-querycoord-64f44b5fc9-blzl7 1/1 Running 3 (2d22h ago) 2d22h 10.104.20.88 4am-node22 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-4zwvh 1/1 Running 3 (2d22h ago) 2d22h 10.104.20.89 4am-node22 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-cd59f 1/1 Running 3 (2d22h ago) 2d22h 10.104.34.182 4am-node37 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-jfqfc 1/1 Running 3 (2d22h ago) 2d22h 10.104.18.115 4am-node25 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-kr27h 1/1 Running 3 (2d22h ago) 2d22h 10.104.5.186 4am-node12 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-t2h2k 1/1 Running 3 (2d22h ago) 2d22h 10.104.13.18 4am-node16 <none> <none>
multi-vector-25m-milvus-querynode-57cbcbc985-t84cd 1/1 Running 2 (2d22h ago) 2d22h 10.104.4.84 4am-node11 <none> <none>
multi-vector-25m-milvus-rootcoord-6fc6c69b9c-hsct6 1/1 Running 3 (2d22h ago) 2d22h 10.104.26.253 4am-node32 <none> <none>
multi-vector-25m-minio-0 1/1 Running 0 2d22h 10.104.25.181 4am-node30 <none> <none>
multi-vector-25m-minio-1 1/1 Running 0 2d22h 10.104.26.7 4am-node32 <none> <none>
multi-vector-25m-minio-2 1/1 Running 0 2d22h 10.104.16.160 4am-node21 <none> <none>
multi-vector-25m-minio-3 1/1 Running 0 2d22h 10.104.30.37 4am-node38 <none> <none>
multi-vector-25m-pulsar-bookie-0 1/1 Running 0 2d22h 10.104.25.180 4am-node30 <none> <none>
multi-vector-25m-pulsar-bookie-1 1/1 Running 0 2d22h 10.104.18.121 4am-node25 <none> <none>
multi-vector-25m-pulsar-bookie-2 1/1 Running 0 2d22h 10.104.16.161 4am-node21 <none> <none>
multi-vector-25m-pulsar-bookie-init-dc2cm 0/1 Completed 0 2d22h 10.104.13.19 4am-node16 <none> <none>
multi-vector-25m-pulsar-broker-0 1/1 Running 0 2d22h 10.104.17.167 4am-node23 <none> <none>
multi-vector-25m-pulsar-proxy-0 1/1 Running 0 2d22h 10.104.14.20 4am-node18 <none> <none>
multi-vector-25m-pulsar-pulsar-init-gfh6v 0/1 Completed 0 2d22h 10.104.25.175 4am-node30 <none> <none>
multi-vector-25m-pulsar-recovery-0 1/1 Running 0 2d22h 10.104.34.181 4am-node37 <none> <none>
multi-vector-25m-pulsar-zookeeper-0 1/1 Running 0 2d22h 10.104.18.120 4am-node25 <none> <none>
multi-vector-25m-pulsar-zookeeper-1 1/1 Running 0 2d22h 10.104.25.183 4am-node30 <none> <none>
multi-vector-25m-pulsar-zookeeper-2 1/1 Running 0 2d22h 10.104.16.165 4am-node21 <none> <none>
clien pod name: fouramf-wxjlk-3985588210 client monitor:
test steps:
concurrent test and calculation of RT and QPS
:test steps:
1. create collection with fields:
'float_vector': 128dim,
'float_vector_1': 200dim,
'float_vector_2': 128dim,
'float_vector_3': 200dim,
scalar field: id(pk), float_1
2. build indexes:
HNSW: 'float_vector', 'float_vector_1', 'float_vector_2', 'float_vector_3'
DEFAULT index type(STL_SORT): 'id'
3. insert 25 million data
4. flush collection
5. build indexes again using the same params
6. load collection
replica: 1
7. concurrent request:
- insert
- delete
- search
- query
- load
- hybrid_search
- scene_test: 1 vector field, 1 primaryKey field
(collection: create->insert->flush->index->drop)
- scene_hybrid_search_test: 4 vector fields, 3 scalar fields, 1 primaryKey field
(collection: create->insert->flush->index->load->hybrid_search->drop)
test result:
[2024-06-14 22:43:42,594 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-06-14 22:43:42,594 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc delete 11091 0(0.00%) | 67 2 7682 10 | 0.26 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc hybrid_search 224244 0(0.00%) | 53 8 7374 22 | 5.19 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc insert 11329 0(0.00%) | 97 4 7615 20 | 0.26 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc load 11454 2(0.02%) | 1008 6 30011 420 | 0.27 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc query 112006 0(0.00%) | 57 3 8325 11 | 2.59 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc scene_hybrid_search_test 22384 0(0.00%) | 110576 10817 695817 99000 | 0.52 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc scene_test 22206 0(0.00%) | 80556 63290 292848 74000 | 0.51 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: grpc search 224084 0(0.00%) | 60 12 7305 28 | 5.19 0.00 (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-06-14 22:43:42,595 - INFO - fouram]: Aggregated 638798 2(0.00%) | 6746 2 695817 25 | 14.79 0.00 (stats.py:789)
[2024-06-14 22:43:42,596 - INFO - fouram]: (stats.py:790)
[2024-06-14 22:43:42,603 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': 'helm',
'deploy_mode': 'cluster',
'config_name': 'cluster_2c2m',
'config': {'queryNode': {'resources': {'limits': {'cpu': '16',
'memory': '32Gi'},
'requests': {'cpu': '8',
'memory': '16Gi'}},
'replicas': 6},
'indexNode': {'resources': {'limits': {'cpu': '6.0',
'memory': '4Gi'},
'requests': {'cpu': '4.0',
'memory': '3Gi'}},
'replicas': 4},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '2Gi'},
'requests': {'cpu': '2.0',
'memory': '2Gi'}},
'replicas': 2},
'cluster': {'enabled': True},
'pulsar': {},
'kafka': {},
'minio': {'metrics': {'podMonitor': {'enabled': True}}},
'etcd': {'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}},
'metrics': {'serviceMonitor': {'enabled': True}},
'log': {'level': 'debug'},
'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus',
'tag': '2.4-20240614-fd1c7b1a-amd64'}}},
'host': 'multi-vector-25m-milvus.qa-milvus.svc.cluster.local',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_25m_multi_hnsw_ddl_dql_dml_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {}},
'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'}},
'scalars_params': {'float_vector_1': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img'}},
'float_vector_2': {'params': {'dim': 128},
'other_params': {'dataset': 'sift'}},
'float_vector_3': {'params': {'dim': 200},
'other_params': {'dataset': 'text2img'}}},
'dataset_name': 'sift',
'dataset_size': 25000000,
'ni_per': 10000},
'collection_params': {'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'float_1'],
'shards_num': 2},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 100,
'during_time': '12h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'insert',
'weight': 1,
'params': {'nb': 1,
'timeout': 600,
'random_id': True,
'random_vector': True,
'varchar_filled': False,
'start_id': 25000000}},
{'type': 'delete',
'weight': 1,
'params': {'expr': '',
'delete_length': 1,
'timeout': 30}},
{'type': 'search',
'weight': 20,
'params': {'nq': 10,
'top_k': 10,
'search_param': {'ef': 32},
'expr': {'float_1': {'GT': -1.0,
'LT': 12500000.0}},
'guarantee_timestamp': None,
'partition_names': None,
'output_fields': ['float_1',
'float_vector_1'],
'ignore_growing': False,
'group_by_field': None,
'timeout': 600,
'random_data': True}},
{'type': 'query',
'weight': 10,
'params': {'ids': None,
'expr': {'float_1': {'GT': 0,
'LT': 100}},
'output_fields': None,
'offset': None,
'limit': None,
'ignore_growing': False,
'partition_names': None,
'timeout': 600,
'random_data': False,
'random_count': 0,
'random_range': [0,
1],
'field_name': 'id',
'field_type': 'int64'}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 30}},
{'type': 'scene_test',
'weight': 2,
'params': {'dim': 128,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': [],
'scalars_params': {},
'scalars_index': {},
'vectors_index': {}}},
{'type': 'hybrid_search',
'weight': 20,
'params': {'nq': 1,
'top_k': 10,
'reqs': [{'search_param': {'ef': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'ef': 64},
'anns_field': 'float_vector_1',
'top_k': 10},
{'search_param': {'ef': 256},
'anns_field': 'float_vector_2',
'top_k': 200},
{'search_param': {'ef': 64},
'anns_field': 'float_vector_3',
'top_k': 30}],
'rerank': {'WeightedRanker': [0.85,
0.95,
0.5,
0.5]},
'output_fields': ['*'],
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True}},
{'type': 'scene_hybrid_search_test',
'weight': 2,
'params': {'nq': 1,
'top_k': 1,
'reqs': [{'search_param': {'nprobe': 128},
'anns_field': 'float_vector',
'top_k': 100},
{'search_param': {'nprobe': 32},
'anns_field': 'float_vector_1',
'top_k': 10},
{'search_param': {'ef': 32},
'anns_field': 'float_vector_2',
'top_k': 5},
{'search_param': {'search_list': 20},
'anns_field': 'float_vector_3',
'top_k': 10}],
'rerank': {'RRFRanker': []},
'output_fields': None,
'ignore_growing': False,
'guarantee_timestamp': None,
'partition_names': None,
'timeout': 600,
'random_data': True,
'dataset': 'local',
'dim': 128,
'shards_num': 2,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2',
'other_fields': ['float_vector_1',
'float_vector_2',
'float_vector_3',
'int64_1',
'bool_1',
'varchar_1'],
'replica_number': 1,
'scalars_params': {'float_vector_1': {'params': {'dim': 128},
'other_params': {'dataset': 'sift'}},
'float_vector_2': {'params': {'dim': 128},
'other_params': {'dataset': 'sift'}},
'float_vector_3': {'params': {'dim': 128},
'other_params': {'dataset': 'sift'}}},
'scalars_index': {'int64_1': {},
'bool_1': {'index_type': 'INVERTED'},
'varchar_1': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'IVF_FLAT',
'index_param': {'nlist': 1024},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_3': {'index_type': 'DISKANN',
'index_param': {},
'metric_type': 'IP'}},
'prepare_before_insert': False,
'hybrid_search_counts': 10,
'new_connect': False,
'new_user': False}}]},
'run_id': 2024061455048265,
'datetime': '2024-06-14 08:58:24.563525',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 797.7971,
'float_vector_1': {'RT': 314.0947},
'float_vector_2': {'RT': 160.9543},
'float_vector_3': {'RT': 56.7346},
'id': {'RT': 0.5175}},
'insert': {'total_time': 4500.9088,
'VPS': 5554.4338,
'batch_time': 1.8004,
'batch': 10000},
'flush': {'RT': 2.5759},
'load': {'RT': 26.6893},
'Locust': {'Aggregated': {'Requests': 638798,
'Fails': 2,
'RPS': 14.79,
'fail_s': 0.0,
'RT_max': 695817.7,
'RT_avg': 6746.23,
'TP50': 25,
'TP99': 142000.0},
'delete': {'Requests': 11091,
'Fails': 0,
'RPS': 0.26,
'fail_s': 0.0,
'RT_max': 7682.49,
'RT_avg': 67.31,
'TP50': 10,
'TP99': 1300.0},
'hybrid_search': {'Requests': 224244,
'Fails': 0,
'RPS': 5.19,
'fail_s': 0.0,
'RT_max': 7374.99,
'RT_avg': 53.59,
'TP50': 22,
'TP99': 780.0},
'insert': {'Requests': 11329,
'Fails': 0,
'RPS': 0.26,
'fail_s': 0.0,
'RT_max': 7615.48,
'RT_avg': 97.93,
'TP50': 20,
'TP99': 1600.0},
'load': {'Requests': 11454,
'Fails': 2,
'RPS': 0.27,
'fail_s': 0.0,
'RT_max': 30011.03,
'RT_avg': 1008.53,
'TP50': 420.0,
'TP99': 8400.0},
'query': {'Requests': 112006,
'Fails': 0,
'RPS': 2.59,
'fail_s': 0.0,
'RT_max': 8325.88,
'RT_avg': 57.44,
'TP50': 11,
'TP99': 1000.0},
'scene_hybrid_search_test': {'Requests': 22384,
'Fails': 0,
'RPS': 0.52,
'fail_s': 0.0,
'RT_max': 695817.7,
'RT_avg': 110576.9,
'TP50': 99000.0,
'TP99': 356000.0},
'scene_test': {'Requests': 22206,
'Fails': 0,
'RPS': 0.51,
'fail_s': 0.0,
'RT_max': 292848.61,
'RT_avg': 80556.57,
'TP50': 74000.0,
'TP99': 153000.0},
'search': {'Requests': 224084,
'Fails': 0,
'RPS': 5.19,
'fail_s': 0.0,
'RT_max': 7305.05,
'RT_avg': 60.82,
'TP50': 28,
'TP99': 850.0}}}}}
Is there an existing issue for this?
Environment
Current Behavior
argo task: fouramf-multi-vector-d5bqx test case name: test_concurrent_locust_50m_multi_ivf_sq8_ddl_dql_standalone
server:
CreateCollection
DropCollection
DescribeCollection
DescribeIndex
GetCollectionStatistics
Flush
LoadCollection
GetLoadState
client pod name: fouramf-multi-vector-d5bqx-3501960532 client monitor:
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
test result: