spotify / voyager

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
https://spotify.github.io/voyager/
Apache License 2.0
1.26k stars 51 forks source link

Cosine distance values outside of <-1;1> range. #46

Closed emsi closed 8 months ago

emsi commented 8 months ago

Version: voyager==2.0.2

The following code:

import numpy as np
from voyager import Index, Space

# Create an empty Index object that can store vectors:
index = Index(Space.Cosine, num_dimensions=5)
id_a = index.add_item([1, -2333, 3, 4, 5])
id_b = index.add_item([6, 7, -8999, 9, 10])

# Find the two closest elements:
neighbors, distances = index.query([1, 2, 3, 4, 5], k=2)
print(distances)

Prints following results:

[1.266731 1.402931]

The cosine function returns values between -1 and 1 as shown on the graph below: image

The values returned by the query function are clearly outside of that range.

For positive vector coordinates values returned seem to be in range <0; 1>, with 0 being closest and 1.0 farthest but that does not make much sense as cosine of 1 means most similar (angle of 0).

emsi commented 8 months ago

OK, I get it. You are using cosine distance defined as 1-cosine similarity but it's not documented.

anilkumar2444 commented 6 months ago

So, the range of dist values is (0,2), right? 0 meaning similar & 2 meaning dissimilar, right?

emsi commented 6 months ago

So, the range of dist values is (0,2), right? 0 meaning similar & 2 meaning dissimilar, right?

Exactement