nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.12k stars 609 forks source link

cosine distance instead of cosine similarity #456

Closed anna-charlotte closed 6 months ago

anna-charlotte commented 1 year ago

Hello there,

I think there has been a mix up with cosine similarity and cosine distance. In the docs it says that you support cosine similarity. But instead the cosine distance is calculated (1- cosine sim.). The resulting order is still correct, but when looking at the scores, they are incorrect. For instance, when computing the distance of vector x = [1.0, 1.0, 1.0, 0.0] to itself, the resulting score is 0.0. For the cosine similarity it would be 1. though instead of 0.

Could you adjust the formula?

yurymalkov commented 1 year ago

Hi @anna-charlotte, Thanks for highlighting the issue! It seems like we can also rename cosine similarity to cosine distance to avoid this confusion. Changing the formula will break previous code.

anna-charlotte commented 1 year ago

Hi @yurymalkov sure sounds good, too 👍