milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.42k stars 2.92k forks source link

[Bug]: [Nightly] The query result is wrong after delete half the entities and compact #37574

Closed NicoYuan1986 closed 9 minutes ago

NicoYuan1986 commented 2 days ago

Is there an existing issue for this?

Environment

- Milvus version: f42869c
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The query result is wrong after delete half the entities and compact.


[pytest : test] [2024-11-10 20:20:11 - DEBUG - ci_test]: (api_request)  : [Collection.insert] args: [      int64   float varchar                         json_field                                       float_vector
[pytest : test] 0         0     0.0       0        {'number': 0, 'float': 0.0}  [0.10080389434530546, 0.062299392725531276, 0....
[pytest : test] 1         1     1.0       1        {'number': 1, 'float': 1.0}  [0.1556......, kwargs: {'timeout': 180} (api_request.py:62)
[pytest : test] [2024-11-10 20:20:11 - DEBUG - ci_test]: (api_response) : (insert count: 2000, delete count: 0, upsert count: 0, timestamp: 453842045914841090, success count: 2000, err count: 0  (api_request.py:37)
[pytest : test] [2024-11-10 20:20:11 - DEBUG - ci_test]: (api_request)  : [Collection.delete] args: ['int64 in [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74......, kwargs: {} (api_request.py:62)
[pytest : test] [2024-11-10 20:20:11 - DEBUG - ci_test]: (api_response) : (insert count: 0, delete count: 1000, upsert count: 0, timestamp: 453842045914841096, success count: 0, err count: 0  (api_request.py:37)
[pytest : test] [2024-11-10 20:20:11 - DEBUG - ci_test]: (api_request)  : [Collection.flush] args: [], kwargs: {'timeout': 180} (api_request.py:62)
[pytest : test] [2024-11-10 20:20:14 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)
[pytest : test] [2024-11-10 20:20:14 - DEBUG - ci_test]: (api_request)  : [Collection.compact] args: [False, 180], kwargs: {} (api_request.py:62)
[pytest : test] [2024-11-10 20:20:14 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)
[pytest : test] [2024-11-10 20:20:15 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 1, 180], kwargs: {} (api_request.py:62)
[pytest : test] [2024-11-10 20:20:16 - DEBUG - ci_test]: (api_response) : None  (api_request.py:37)
[pytest : test] [2024-11-10 20:20:16 - DEBUG - ci_test]: (api_request)  : [Collection.query] args: ['int64 >= 0', ['count(*)'], None, 180], kwargs: {} (api_request.py:62)
[pytest : test] [2024-11-10 20:20:17 - DEBUG - ci_test]: (api_response) : data: ["{'count(*)': 2000}"]   (api_request.py:37)

Expected Behavior

the result is half the number of entities

Steps To Reproduce

No response

Milvus Log

  1. link: https://jenkins.milvus.io:18080/blue/organizations/jenkins/Milvus%20Nightly%20CI(new)/detail/master/178/pipeline/114/
  2. log: artifacts-milvus-standalone-ms-master-178-py-n-178-e2e-logs.tar.gz
  3. failed time: [pytest : test] [gw1] [ 52%] FAILED testcases/test_query.py::TestQueryCount::test_count_compact_delete

Anything else?

No response

yanliang567 commented 2 days ago

/assign @XuanYang-cn /unassign

xiaocai2333 commented 1 day ago

It seems to be related to this PR #37385. From the logs, I noticed that Milvus loads sements in the following order: first Growing, the L0, and the L1/L2. The changs in this PR cause some L1 segments to load as Growing, so they do not reference the deletion messages in L0.

I will continue to investigate why Growing is loaded first.

xiaocai2333 commented 1 day ago

I found the reason. When loading L0 during WatchDmChannels, the L0 segment was retrieved from the wrong field. FlushedSegmentIds does not contain any L0 segments; they are actually in LevelZeroSegmentIds.

image

I will fix it.

xiaocai2333 commented 1 day ago

The reason L0 eventually loaded is that QueryCoord detected that some L0 segments were not loaded and resent the loading segments request.

XuanYang-cn commented 1 day ago

/unassign

xiaofan-luan commented 15 hours ago

/assign @NicoYuan1986

could you pls on verifying it?

yanliang567 commented 9 minutes ago

verified not reproduce on https://jenkins.milvus.io:18080/blue/rest/organizations/jenkins/pipelines/Milvus%20Nightly%20CI(new)/branches/master/runs/180/nodes/66/steps/137/log/?start=0