milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.79k stars 2.86k forks source link

[Bug]: Incompleted query result #34247

Closed bigsheeper closed 2 months ago

bigsheeper commented 3 months ago

Is there an existing issue for this?

Environment

- Milvus version: master&2.4
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

In segcore, once the number of query results reaches the required number, it returns:

So in the scenario with duplicate pks, the results obtained may include duplicates. We need to deduplicate pks in segcore reduce, similar to internal reduce and global reduce.

This issue exists for both growing segments and sealed segments.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

bigsheeper commented 3 months ago

see also: https://github.com/milvus-io/milvus/issues/34021

xiaofan-luan commented 3 months ago

we need to make sure there is no duplicate pK in every segment.

This is done by:

  1. Temporarily -> mask entry duplicated PK as delete
  2. Final fix -> when segment is flushed, do an analyze task tor reorder
bigsheeper commented 3 months ago

we need to make sure there is no duplicate pK in every segment.

This is done by:

  1. Temporarily -> mask entry duplicated PK as delete
  2. Final fix -> when segment is flushed, do an analyze task tor reorder

@zhagnlu will help implement feature 1.

bigsheeper commented 3 months ago

/assign @zhagnlu