Open wangting0128 opened 4 days ago
/assign @XuanYang-cn /unassign
Look at the code, I found that for now even if just do create collection add partiton search on the partition on a multi node deployment you could see a bug here since meta cache is never update on adding partition case
@wangting0128 can you try to reproduce on a simpler case with 2 proxies?
/assign @xiaofan-luan
Look at the code, I found that for now even if just do create collection add partiton search on the partition on a multi node deployment you could see a bug here since meta cache is never update on adding partition case
@wangting0128 can you try to reproduce on a simpler case with 2 proxies?
Deploy milvus with two proxy and concurrent multi-partition (partition: create->insert->flush->index again->load->search->release->search failed->drop). This problem did not reoccur.
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
envoy-verify-37173-9f5f79c86-zngsk 1/1 Running 0 3h32m 10.104.24.214 4am-node29 <none> <none>
verify-37173-etcd-0 1/1 Running 0 3h42m 10.104.30.187 4am-node38 <none> <none>
verify-37173-etcd-1 1/1 Running 0 3h42m 10.104.26.201 4am-node32 <none> <none>
verify-37173-etcd-2 1/1 Running 0 3h42m 10.104.32.120 4am-node39 <none> <none>
verify-37173-milvus-datanode-57cc457794-nbvb7 1/1 Running 3 (3h37m ago) 3h42m 10.104.20.152 4am-node22 <none> <none>
verify-37173-milvus-indexnode-6f64fc677f-bd9fw 1/1 Running 2 (3h42m ago) 3h42m 10.104.9.69 4am-node14 <none> <none>
verify-37173-milvus-mixcoord-86df6b57bb-w7pkg 1/1 Running 3 (3h37m ago) 3h42m 10.104.9.66 4am-node14 <none> <none>
verify-37173-milvus-proxy-7479b569d4-p5hq8 1/1 Running 3 (3h37m ago) 3h42m 10.104.9.65 4am-node14 <none> <none>
verify-37173-milvus-proxy-7479b569d4-t8gfd 1/1 Running 2 (3h38m ago) 3h42m 10.104.26.195 4am-node32 <none> <none>
verify-37173-milvus-querynode-6db5c4d57-mdhs6 1/1 Running 2 (3h42m ago) 3h42m 10.104.6.49 4am-node13 <none> <none>
verify-37173-minio-0 1/1 Running 0 3h42m 10.104.20.157 4am-node22 <none> <none>
verify-37173-minio-1 1/1 Running 0 3h42m 10.104.26.202 4am-node32 <none> <none>
verify-37173-minio-2 1/1 Running 0 3h42m 10.104.30.189 4am-node38 <none> <none>
verify-37173-minio-3 1/1 Running 0 3h42m 10.104.32.122 4am-node39 <none> <none>
verify-37173-pulsar-bookie-0 1/1 Running 0 3h42m 10.104.30.190 4am-node38 <none> <none>
verify-37173-pulsar-bookie-1 1/1 Running 0 3h42m 10.104.26.203 4am-node32 <none> <none>
verify-37173-pulsar-bookie-2 1/1 Running 0 3h42m 10.104.32.125 4am-node39 <none> <none>
verify-37173-pulsar-bookie-init-8l2pf 0/1 Completed 0 3h42m 10.104.9.68 4am-node14 <none> <none>
verify-37173-pulsar-broker-0 1/1 Running 0 3h42m 10.104.5.17 4am-node12 <none> <none>
verify-37173-pulsar-proxy-0 1/1 Running 0 3h42m 10.104.26.196 4am-node32 <none> <none>
verify-37173-pulsar-pulsar-init-6m888 0/1 Completed 0 3h42m 10.104.9.67 4am-node14 <none> <none>
verify-37173-pulsar-recovery-0 1/1 Running 0 3h42m 10.104.9.64 4am-node14 <none> <none>
verify-37173-pulsar-zookeeper-0 1/1 Running 0 3h42m 10.104.30.188 4am-node38 <none> <none>
verify-37173-pulsar-zookeeper-1 1/1 Running 0 3h41m 10.104.26.209 4am-node32 <none> <none>
verify-37173-pulsar-zookeeper-2 1/1 Running 0 3h41m 10.104.24.145 4am-node29 <none> <none>
client result:
[2024-10-29 06:56:32,345 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2024-10-29 06:56:32,345 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-10-29 06:56:32,345 - INFO - fouram]: grpc scene_test_partition 724 0(0.00%) | 294928 80210 684162 287000 | 0.07 0.00 (stats.py:789)
[2024-10-29 06:56:32,345 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-10-29 06:56:32,345 - INFO - fouram]: Aggregated 724 0(0.00%) | 294928 80210 684162 287000 | 0.07 0.00 (stats.py:789)
[2024-10-29 06:56:32,345 - INFO - fouram]: (stats.py:790)
[2024-10-29 06:56:32,346 - INFO - fouram]: [PerfTemplate] Report data:
{'server': {'deploy_tool': '', 'deploy_mode': '', 'config_name': '', 'config': {}, 'host': 'envoy-verify-37173.qa-milvus', 'port': 19530, 'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_custom_parameters',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'scalars_index': {'id': {}, 'int64_1': {'index_type': 'INVERTED'}, 'varchar_1': {'index_type': 'INVERTED'}},
'vectors_index': {'float_vector_1': {'index_type': 'HNSW',
'index_param': {'M': 8, 'efConstruction': 200},
'metric_type': 'L2'},
'float_vector_2': {'index_type': 'DISKANN', 'index_param': {}, 'metric_type': 'IP'},
'float_vector_3': {'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2'}},
'scalars_params': {'float_vector_1': {'params': {'dim': 128}, 'other_params': {'dataset': 'sift'}},
'float_vector_2': {'params': {'dim': 128}, 'other_params': {'dataset': 'sift'}},
'float_vector_3': {'params': {'dim': 128}, 'other_params': {'dataset': 'sift'}}},
'extra_partitions': {'partitions': ['_default', 'partition_1', 'partition_2', 'partition_3', 'partition_4',
'partition_5', 'partition_6', 'partition_7', 'partition_8',
'partition_9'],
'data_repeated': False},
'dataset_name': 'sift',
'dataset_size': 1000000,
'ni_per': 10000},
'collection_params': {'other_fields': ['float_vector_1', 'float_vector_2', 'float_vector_3', 'int64_1', 'varchar_1'],
'shards_num': 2},
'index_params': {'index_type': 'IVF_FLAT', 'index_param': {'nlist': 1024}},
'concurrent_params': {'concurrent_number': 20, 'during_time': '3h', 'interval': 20, 'spawn_rate': None},
'concurrent_tasks': [{'type': 'scene_test_partition',
'weight': 1,
'params': {'data_size': 3000,
'ni': 3000,
'nq': 1,
'search_param': {'nprobe': 64},
'limit': 1,
'output_fields': ['*'],
'timeout': 600}}]},
'run_id': 2024102934536792,
'datetime': '2024-10-29 03:44:13.664626',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 445.5318,
'float_vector_1': {'RT': 2.8603},
'float_vector_2': {'RT': 10.2972},
'float_vector_3': {'RT': 2.7067},
'id': {'RT': 1.1195},
'int64_1': {'RT': 1.8185},
'varchar_1': {'RT': 0.7767}},
'insert': {'total_time': 241.4433, 'VPS': 4779.1453, 'batch_time': 2.4144, 'batch': 10000.0},
'flush': {'RT': 2.5434},
'load': {'RT': 6.1546},
'Locust': {'Aggregated': {'Requests': 724,
'Fails': 0,
'RPS': 0.07,
'fail_s': 0.0,
'RT_max': 684162.23,
'RT_avg': 294928.07,
'TP50': 287000.0,
'TP99': 537000.0},
'scene_test_partition': {'Requests': 724,
'Fails': 0,
'RPS': 0.07,
'fail_s': 0.0,
'RT_max': 684162.23,
'RT_avg': 294928.07,
'TP50': 287000.0,
'TP99': 537000.0}}}}}
@wangting0128 did you tried to create a partition and search multiple times see what's going on there
Is there an existing issue for this?
Environment
Current Behavior
argo task: multi-vector-corn-1-1729864800 test case name: test_hybrid_search_locust_dql_dml_partition_cluster
server:
milvus log:
client log:
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
test result: