Added pgvectorscale DiskANN index support, and changed distance metric for search in PGVector from Euclidean distance to Cosine similarity
Description
This change is intended to use a more relevant distance metric for search in PGVector. By default, Euclidean distance is used in mem0's search of the PGVector collection which isn't very relevant for embedded text similarity. Mem0 uses dot product in Qdrant which is equivalent to cosine similarity for normalized vectors, which OpenAI's embeddings are, which is the default provider. For this reason, either cosine similarity or dot product are the most relevant metrics.
Additionally, I included the option to use DiskANN from PGVectorScale, which is an approximate nearest neighbor search algorithm streamed in an efficient way from persistent storage. This is also added to the configuration for PGVector in Mem0 where by default the option is set to True as it doesn't break without vectorscale being installed. The code will attempt to detect whether pgvectorscale is installed before creating the index, and makes sure it doesn't already exist.
Relevant documentation was updated.
Best regards,
Applied Data Science Engineer
Proudly representing NetFire
Type of change
[x] New feature (non-breaking change which adds functionality)
Added pgvectorscale DiskANN index support, and changed distance metric for search in PGVector from Euclidean distance to Cosine similarity
Description
This change is intended to use a more relevant distance metric for search in PGVector. By default, Euclidean distance is used in mem0's search of the PGVector collection which isn't very relevant for embedded text similarity. Mem0 uses dot product in Qdrant which is equivalent to cosine similarity for normalized vectors, which OpenAI's embeddings are, which is the default provider. For this reason, either cosine similarity or dot product are the most relevant metrics.
Additionally, I included the option to use DiskANN from PGVectorScale, which is an approximate nearest neighbor search algorithm streamed in an efficient way from persistent storage. This is also added to the configuration for PGVector in Mem0 where by default the option is set to True as it doesn't break without vectorscale being installed. The code will attempt to detect whether pgvectorscale is installed before creating the index, and makes sure it doesn't already exist.
Relevant documentation was updated.
Best regards, Applied Data Science Engineer Proudly representing NetFire
Type of change
How Has This Been Tested?
Checklist:
Maintainer Checklist