milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Bug]: Two identical matrices,Inaccurate similarity? #19822

Closed S-Dragon0302 closed 1 year ago

S-Dragon0302 commented 2 years ago

Is there an existing issue for this?

Environment

- Milvus version:2.1.4
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 8/64
- GPU: 
- Others:

Current Behavior

Two identical matrices,Inaccurate similarity? search vector and result vector 0.11,-0.09,-0.03,0.19,-0.03,0.04,0.03,0.06,-0.01,-0.07,-0.04,0.05,-0.01,-0.08,-0.02,-0.12,-0.17,0.0,-0.05,0.0,0.06,0.15,0.05,0.13,-0.01,0.0,0.01,0.13,-0.07,-0.02,-0.14,0.13,0.01,0.09,-0.03,0.05,-0.04,0.03,-0.02,0.02,-0.12,0.03,0.04,0.05,-0.01,0.01,0.02,0.0,-0.09,-0.06,0.14,0.22,-0.1,0.09,0.14,0.06,0.07,0.02,-0.03,0.0,-0.06,-0.05,-0.04,0.01,0.08,-0.02,0.15,0.06,-0.03,-0.08,-0.04,-0.21,-0.02,0.05,-0.1,-0.01,-0.14,-0.06,-0.14,-0.1,0.17,0.14,0.11,-0.17,-0.12,-0.04,-0.11,0.05,-0.11,0.14,-0.03,-0.15,0.05,0.09,0.02,0.06,0.18,0.08,0.04,0.0,0.01,0.04,-0.06,-0.2,-0.06,0.04,-0.03,0.01,-0.06,-0.06,0.05,-0.09,0.0,0.03,-0.01,0.15,0.16,-0.01,0.08,0.15,0.17,0.09,0.06,-0.02,0.03,0.08,0.08,-0.13

Expected Behavior

Score:1

Steps To Reproduce

1.indexType
MetricType:1024
nlist:1024
m:4
nbits:8
2.searchType
"{\"nprobe\":32}"
vector:0.11,-0.09,-0.03,0.19,-0.03,0.04,0.03,0.06,-0.01,-0.07,-0.04,0.05,-0.01,-0.08,-0.02,-0.12,-0.17,0.0,-0.05,0.0,0.06,0.15,0.05,0.13,-0.01,0.0,0.01,0.13,-0.07,-0.02,-0.14,0.13,0.01,0.09,-0.03,0.05,-0.04,0.03,-0.02,0.02,-0.12,0.03,0.04,0.05,-0.01,0.01,0.02,0.0,-0.09,-0.06,0.14,0.22,-0.1,0.09,0.14,0.06,0.07,0.02,-0.03,0.0,-0.06,-0.05,-0.04,0.01,0.08,-0.02,0.15,0.06,-0.03,-0.08,-0.04,-0.21,-0.02,0.05,-0.1,-0.01,-0.14,-0.06,-0.14,-0.1,0.17,0.14,0.11,-0.17,-0.12,-0.04,-0.11,0.05,-0.11,0.14,-0.03,-0.15,0.05,0.09,0.02,0.06,0.18,0.08,0.04,0.0,0.01,0.04,-0.06,-0.2,-0.06,0.04,-0.03,0.01,-0.06,-0.06,0.05,-0.09,0.0,0.03,-0.01,0.15,0.16,-0.01,0.08,0.15,0.17,0.09,0.06,-0.02,0.03,0.08,0.08,-0.13
SearchParam searchParam = SearchParam.newBuilder()
                .withCollectionName(COLLECTION_NAME)
                .withMetricType(MetricType.IP)
                .withOutFields(outFields)
                .withTopK(SEARCH_K)
                .withVectors(vectors)
                .withVectorFieldName(VECTOR_FIELD)
                .withParams(SEARCH_PARAM)
                .build();

        R<SearchResults> response = milvusClient.search(searchParam);

Score:0.89371216

Milvus Log

No response

Anything else?

No response

yanliang567 commented 2 years ago

/assign @cydrain /unassign

cydrain commented 2 years ago

I think you're using index type "IVF_PQ" and metric type "IP". IVF_PQ will do product quantization for the vector, so the distance you see is not the IP distance between vector 'a' and itself, but 'a' and 'a_mod' (after product quantization). So the distance is not 1.

You can also see the distance is not 1.0 for index type IVF_SQ8. If you choose index type IVF_FLAT, the distance will be 1.0.

yanliang567 commented 2 years ago

/assign @S-Dragon0302

@S-Dragon0302 could you please try as suggested above? /unassign @cydrain

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.