milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.34k stars 2.73k forks source link

[Bug]: Lose valid data when setting radius and range filter in vector search #33636

Closed sunjiaqi52777 closed 21 hours ago

sunjiaqi52777 commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version: 2.3
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I have one collection which contains over 86,000,000 data. When I was performing COSINE similarity search with radius=0.7, range filter=1.001, nprobe & nlist=128, and topk=250, the expected record was not returned even its score is 0.712.

I wonder what the searching mechanism of milvus is. Also I'm curious whether this issue is because of quantification indexing type (I'm using IVF_SQ8). Will this issue be solved if I change the Indexing type from IVF_SQ8 to IVF_FLAT?

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

yanliang567 commented 1 month ago

@sunjiaqi52777 may i ask how many results returned in the range search? are they all in range [0.7, 1.001] I am asking because it is not always 100% accurate in ann search. you could try ivf_flat, but it is not 100% either. /assign @sunjiaqi52777 /unassign

sunjiaqi52777 commented 1 month ago

@yanliang567 Thanks for replying. I tried on both Milvus Attu and Pymilvus, which returned different counts with topk=250.

xiaofan-luan commented 1 month ago
  1. you don't really need to set range filter=1.001
  2. if attu and pymilvus get different result, this means pymilvus might get some bug. did you try the latest pymilvus 2.3?
stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.