The IP space [1.0 - sum(AiBi)] might be inaccurate. Why not separate the IP from the cosine, and define the IP simply as [- sum(AiBi)]?

nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors

https://github.com/nmslib/hnswlib

Apache License 2.0

4.11k stars 607 forks source link

The IP space [1.0 - sum(AiBi)] might be inaccurate. Why not separate the IP from the cosine, and define the IP simply as [- sum(AiBi)]? #554

Closed Arthur-Bi closed 2 months ago

Arthur-Bi commented 3 months ago

I understand that define Ip as [1.0 - sum(Ai*Bi)] is really convenient for calculating Cosine similarity. But this might cause in-accuracy when calculating IP. I checked faiss and milvus, they donnot have 1.0 -

yurymalkov commented 2 months ago

Hm. Why that would inaccurate? I can imagine it to make a difference only if values of - sum(AiBi) are very close to zero = orthogonal vectors, which seems to be very uncommmon