milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Bug]: the distance of IVF_PQ, GPU_IVF_PQ search results are wrong #35151

Closed liangbug closed 3 months ago

liangbug commented 3 months ago

Is there an existing issue for this?

Environment

- Milvus version: 2.4.6-gpu
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.4.4
- OS(Ubuntu or CentOS): Ubuntu 22.04
- CPU/Memory: Intel(R) Xeon(R) Gold 6240, 512GB ram
- GPU: Nvidia RTX 8000
- Others:

Current Behavior

I inserted 90k unit vectors with 384 dim, length 1 and created IVF_PQ, GPU_IVF_PQ GPU index with metric: IP, "m": 8, "nlist": 1024. Then I found that I search exactly the same vectors from db and the distance of top 1 results (the exact vector), some are less than 1, some equal 1 .

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

binbinlv commented 3 months ago

@liangbug could you please paste the detailed milvus log here? Thanks.

And here what you mean is the distance of top1 result is different when you search twice using the same vector?

liangbug commented 3 months ago

The below figures are made by zilliztech/attu image

image

image

image The pymilvus also make the same results.

xiaofan-luan commented 3 months ago

don't quite understand what you are saying, there is no guarantee that same search can return same result, especially under your case you are using IVFPQ and the recall is very low.

We can not guarantee same result to be returned becasue:

  1. growing index can be handoff to sealed
  2. sealed segment can be compacted.

I suggest you use better search parameters

  1. do not use PQ
  2. use larger nprobe. For nlist=1024, nprobe=32 or 64 is recommended
yanliang567 commented 3 months ago

/assign @liangbug /unassign

liangbug commented 3 months ago

@binbinlv

@liangbug could you please paste the detailed milvus log here? Thanks.

And here what you mean is the distance of top1 result is different when you search twice using the same vector?

I don't see anything unusual in the milvus standalone info log.
What kind of log do you need? Can you describe in detail?

The search twice or more returns always return the same result.

The id 450885682672689422 vector and id 450885682672689423 vector are length 1. I searched using 450885682672689423 vector and got result 450885682672689423 vector with distance (score) of 0.99. But I searched using 450885682672689422 vector and got result 450885682672689422 vector with distance (score) of 0.8. Why are they different ?

cqy123456 commented 3 months ago
  1. Is the input data normalized? If not, IP<x, x> = x^2, there is a high probability that it is not 1;
  2. you use M = 8 with dim = 384, it means the distance of Metric<x, y> is the sum of 8 code distances. and it is an erroneous way to calculate distance.
liangbug commented 3 months ago

@cqy123456

  1. Is the input data normalized? If not, IP<x, x> = x^2, there is a high probability that it is not 1;
  2. you use M = 8 with dim = 384, it means the distance of Metric<x, y> is the sum of 8 code distances. and it is an erroneous way to calculate distance.
  1. the input data is normalized.
  2. M= 1, the distance(score) of vector search itself id 450885682672689422 is 0.78 M= 8, the distance(score) of vector search itself id 450885682672689422 is 0.8 M= 128, the distance(score) of vector search itself id 450885682672689422 is 0.97

Is it sensible that returning PQ metric not original metric ? Does any other algorithm return non original metric?

cqy123456 commented 3 months ago

cpu index: IVF_FLAT, HNSW, and SCANN; gpu index: GPU_CAGRA, GPU_IVF_FLAT, and GPU_BRUTE_FORCE.