milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.08k stars 2.88k forks source link

[Bug]: When using vector query, the similarity of the same vector query is very different. The similarity calculated using cos is 0.999999, while the similarity calculated by milvus is 0.81, 0.91 #29787

Closed silencesmile closed 8 months ago

silencesmile commented 9 months ago

Is there an existing issue for this?

Environment

- Milvus version:2.3.x
- Deployment mode(standalone or cluster):standalone 
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):2.3.4
- OS(Ubuntu or CentOS): Ubuntu 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

When using vector clustering, the cluster center vector is generated and the vector is inserted into milvus. The vector value is inconsistent with the vector (the last few digits of the decimal point). When the same vector is used for query again, the similarity is only 0.81, even if it is a copied milvus library. The similarity of the vectors saved in is also more than 0.8. When using cos to calculate the vector, the similarity is 0.9999. If you use milvus to query the similarity topn, the result error is very large.

222222 1111 python_cos_similarity.py.txt

Expected Behavior

希望相同的向量查询的相似度为1.0

Steps To Reproduce

1. Open the txt file and modify it to a .py file
2. Execute the py file and get a similarity of 0.9999
3. Insert the vector in the py file into milvus
4. Use another set of vectors or inserted vectors to query the similarity of milvus
5. The similarity result is 0.81

Milvus Log

No response

Anything else?

No response

github-actions[bot] commented 9 months ago

The title and description of this issue contains Chinese. Please use English to describe your issue.

xiaofan-luan commented 9 months ago

@silencesmile please change your description into english, thanks

xiaofan-luan commented 9 months ago

Milvus supports ann search and it does not guarantee top1 can be searched.

you should not use nprobe == 1 and it's heavily possible you didn't get the entity.

please check the search result see if you get the orignal vector you want.

yanliang567 commented 9 months ago

@silencesmile I think you should normalize your vectors as you are using IP metric. Try run the file attached. python_cos_similarity.py.normalize.txt

yanliang567 commented 9 months ago

/assign @silencesmile /unassign

stale[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.