milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.96k stars 2.95k forks source link

[Bug]: [benchmark][standalone] upsert data with existing primary keys, the result of query count* is less then expected #37238

Closed wangting0128 closed 2 weeks ago

wangting0128 commented 4 weeks ago

Is there an existing issue for this?

Environment

- Milvus version:2.5-20241028-7134526d-amd64
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):rocksmq    
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc97
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: upsert-count-1730142000

server:

NAME                                                              READY   STATUS      RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
upsert-count-1742000-1-2-8305-etcd-0                              1/1     Running     0                5h55m   10.104.30.173   4am-node38   <none>           <none>
upsert-count-1742000-1-2-8305-milvus-standalone-74b6954865sn4wf   1/1     Running     0                5h55m   10.104.34.24    4am-node37   <none>           <none>
upsert-count-1742000-1-2-8305-minio-59d5cb5db6-8pwm7              1/1     Running     0                5h55m   10.104.30.177   4am-node38   <none>           <none>

client log: query result check error {pod=~"upsert-count-1730142000-1960021867"} |~ "ERROR" |~ "ClientTask"

截屏2024-10-29 11 50 29

Expected Behavior

No response

Steps To Reproduce

1. deploy a standalone milvus and reset quotaAndLimits
2. create a collection with fields: 'id', 'float_vector', 'varchar_1': is_partition_key
3. build index HNSW
4. insert 2m data
5. flush collection
6. rebuild index HNSW
7. load colleciton
8. concurrent request
   - query: 'count(*)'  <- expected 2m, but less then 2m
   - upsert: id 1 ~ 2000

Milvus Log

No response

Anything else?

test result:

[2024-10-29 01:01:53,540 -  INFO - fouram]: Print locust final stats. (locust_runner.py:56)
[2024-10-29 01:01:53,540 -  INFO - fouram]: Type     Name                                                                          # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s (stats.py:789)
[2024-10-29 01:01:53,541 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-10-29 01:01:53,541 -  INFO - fouram]: grpc     query                                                                           7893 4180(52.96%) |      2       0      52      0 |    0.44        0.23 (stats.py:789)
[2024-10-29 01:01:53,541 -  INFO - fouram]: grpc     upsert                                                                          7662     0(0.00%) |   2342     107    4236   1500 |    0.43        0.00 (stats.py:789)
[2024-10-29 01:01:53,541 -  INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2024-10-29 01:01:53,541 -  INFO - fouram]:          Aggregated                                                                     15555 4180(26.87%) |   1155       0    4236     18 |    0.86        0.23 (stats.py:789)
[2024-10-29 01:01:53,541 -  INFO - fouram]:  (stats.py:790)
[2024-10-29 01:01:53,542 -  INFO - fouram]: [PerfTemplate] Report data: 
{'server': {'deploy_tool': 'helm',
            'deploy_mode': 'standalone',
            'config_name': 'standalone_8c16m',
            'config': {'standalone': {'resources': {'limits': {'cpu': 8, 'memory': '16Gi'}, 'requests': {'cpu': 8, 'memory': '16Gi'}},
                                      'profiling': {'enabled': True}},
                       'cluster': {'enabled': False},
                       'etcd': {'replicaCount': 1, 'metrics': {'enabled': True, 'podMonitor': {'enabled': True}}},
                       'minio': {'mode': 'standalone', 'metrics': {'podMonitor': {'enabled': True}}},
                       'pulsar': {'enabled': False},
                       'metrics': {'serviceMonitor': {'enabled': True}},
                       'log': {'level': 'debug'},
                       'extraConfigFiles': {'user.yaml': 'quotaAndLimits:\n'
                                                         '  dml:\n'
                                                         '    enabled: true\n'
                                                         '    upsertRate:\n'
                                                         '      max: 0.5\n'
                                                         '    insertRate:\n'
                                                         '      max: 0.5\n'
                                                         '    deleteRate:\n'
                                                         '      max: 0.5\n'
                                                         '  quotaCenterCollectInterval: 1\n'
                                                         '\n'},
                       'image': {'all': {'repository': 'harbor.milvus.io/milvus/milvus', 'tag': '2.5-20241028-7134526d-amd64'}}},
            'host': 'upsert-count-1742000-1-2-8305-milvus.qa-milvus.svc.cluster.local',
            'port': '19530',
            'uri': ''},
 'client': {'test_case_type': 'ConcurrentClientBase',
            'test_case_name': 'test_concurrent_locust_custom_parameters',
            'test_case_params': {'dataset_params': {'metric_type': 'L2',
                                                    'dim': 128,
                                                    'scalars_params': {'varchar_1': {'params': {'is_partition_key': True, 'max_length': 100}}},
                                                    'dataset_name': 'sift',
                                                    'dataset_size': '2m',
                                                    'ni_per': 5000},
                                 'collection_params': {'other_fields': ['varchar_1'], 'shards_num': 2, 'num_partitions': 16},
                                 'load_params': {},
                                 'release_params': {},
                                 'query_params': {},
                                 'search_params': {},
                                 'resource_groups_params': {'reset': False},
                                 'database_user_params': {'reset_rbac': False, 'reset_db': False},
                                 'index_params': {'index_type': 'HNSW', 'index_param': {'M': 8, 'efConstruction': 200}},
                                 'concurrent_params': {'concurrent_number': 1, 'during_time': '5h', 'interval': 20, 'spawn_rate': None},
                                 'concurrent_tasks': [{'type': 'query',
                                                       'weight': 1,
                                                       'params': {'expr': '',
                                                                  'output_fields': ['count(*)'],
                                                                  'check_task': 'check_query_output_count',
                                                                  'check_items': {'query_count': 2000000}}},
                                                      {'type': 'upsert', 'weight': 1, 'params': {'nb': 2000, 'random_id': False, 'start_id': 1}}]},
            'run_id': 2024102824047258,
            'datetime': '2024-10-28 19:06:44.003933',
            'client_version': '2.5.0'},
 'result': {'test_result': {'index': {'RT': 191.2608},
                            'insert': {'total_time': 2158.6659, 'VPS': 926.4982, 'batch_time': 5.3967, 'batch': 5000},
                            'flush': {'RT': 3.0322},
                            'load': {'RT': 3.4685},
                            'Locust': {'Aggregated': {'Requests': 15555,
                                                      'Fails': 4180,
                                                      'RPS': 0.86,
                                                      'fail_s': 0.27,
                                                      'RT_max': 4236.01,
                                                      'RT_avg': 1155.18,
                                                      'TP50': 18,
                                                      'TP99': 4000.0},
                                       'query': {'Requests': 7893,
                                                 'Fails': 4180,
                                                 'RPS': 0.44,
                                                 'fail_s': 0.53,
                                                 'RT_max': 52.86,
                                                 'RT_avg': 2.64,
                                                 'TP50': 0,
                                                 'TP99': 20},
                                       'upsert': {'Requests': 7662,
                                                  'Fails': 0,
                                                  'RPS': 0.43,
                                                  'fail_s': 0.0,
                                                  'RT_max': 4236.01,
                                                  'RT_avg': 2342.47,
                                                  'TP50': 1500.0,
                                                  'TP99': 4100.0}}}}}
xiaofan-luan commented 4 weeks ago

/assign @aoiasd please help on this

aoiasd commented 3 weeks ago

Merge sort segment will advanced exit if data was deleted or expired, cause data loss. Partial fix at: https://github.com/milvus-io/milvus/pull/37310 Need cherry-pick

wangting0128 commented 2 weeks ago

verification passed

argo task: upsert-count-1730919600 image: master-20241106-8275e40f-amd64