pgvector / pgvector-php

pgvector support for PHP
MIT License
121 stars 7 forks source link

Potential Discrepancy in Cosine Similarity Calculation in HasNeighbors.php #10

Closed dolcedev closed 9 months ago

dolcedev commented 9 months ago

I was reviewing the pgvector-php documentation and noticed that for calculating cosine similarity, it suggests using the formula 1 - cosine distance, as demonstrated in the following SQL snippet:

SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;

However, upon examining the implementation within HasNeighbors.php, I could not find this formula being applied or any related implementation for cosine similarity. I am not an expert in vector distances, but based on the documentation, I believe this might lead to incorrect results when trying to utilize cosine similarity measures within the library. Is this correct?

Thank you in advance and great work!

ankane commented 9 months ago

Hi @dolcedev, nearestNeighbors adds a neighbor_distance attribute for the distance, so you'll need to do 1 - neighbor_distance to get the similarity.