milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.03k stars 2.95k forks source link

[Bug]: Incorrect search result of pymilvus example #20703

Closed yhmo closed 2 years ago

yhmo commented 2 years ago

Is there an existing issue for this?

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Start a milvus server(2.2) Run this pymilvus example: https://github.com/milvus-io/pymilvus/blob/master/examples/example.py

The example insert 10000 vectors into milvus, and then use the No.1 No.2 and No.3 vectors to search.

Screenshot from 2022-11-18 11-44-33

But the result is incorrect, in the result for No.1 vector, the Top0 should be the No.1 vector, but it isn't. Screenshot from 2022-11-18 10-15-40

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

yhmo commented 2 years ago

Same problem: https://github.com/milvus-io/milvus/discussions/20680

yanliang567 commented 2 years ago

/assign @jiaoew1991 /unassign

jiaoew1991 commented 2 years ago

/assign @XuanYang-cn /unassign

XuanYang-cn commented 2 years ago
  115     search_param = {                                                                                                                        
  116         "data": search_vectors,                                                                                                             
  117         "anns_field": vector_field,                                                                                                         
  118         "param": {"metric_type": _METRIC_TYPE, "params": {"nprobe": _NPROBE}},                                                              
  119         "limit": _TOPK,                                                                                                                     
  120         "expr": "id_field > 0"}    

The first vector's id==0, and the expr requires "id_field>0", so the first result won't be the NO.1 vector.

change id_field>0 to id_field>=0 will get the following:

Search result for 0th vector: 
Top 0: (distance: 0.0, id: 0)
Top 1: (distance: 15.004575729370117, id: 5568)
Top 2: (distance: 15.679189682006836, id: 8573)

Search result for 1th vector: 
Top 0: (distance: 0.0, id: 1)
Top 1: (distance: 14.305086135864258, id: 4011)
Top 2: (distance: 14.806175231933594, id: 3669)

Search result for 2th vector: 
Top 0: (distance: 0.0, id: 2)
Top 1: (distance: 14.328184127807617, id: 5903)
Top 2: (distance: 14.504249572753906, id: 471)
XuanYang-cn commented 2 years ago

/unassign /assign @yhmo