Closed JINO-ROHIT closed 1 month ago
I found a similar closed issue that might be relevant to your question. According to this issue, the default similarity metric is cosine similarity, and it is hardcoded in the code. You can find where it is defined in the llama_index/embeddings/base.py
file at line 33.
Regarding your specific questions:
Different Responses from ChromaDB and Qdrant: Yes, it is possible for the responses from ChromaDB and Qdrant to be different due to the similarity metric used. The similarity metric affects the ranking of the nodes based on their similarity scores.
Changing the Similarity Metric in Qdrant:
The similarity metric in the Qdrant codebase is defined or configured in the search_batch
and search
methods of the QdrantVectorStore
class. Specifically, it is configured using the limit
parameter, which is set to query.similarity_top_k
. Here is the relevant code snippet:
response = self._client.search_batch(
collection_name=self.collection_name,
requests=[
rest.SearchRequest(
vector=rest.NamedVector(
name=DENSE_VECTOR_NAME,
vector=query_embedding,
),
limit=query.similarity_top_k,
filter=query_filter,
with_payload=True,
),
],
)
And for the asynchronous method:
response = await self._aclient.search_batch(
collection_name=self.collection_name,
requests=[
rest.SearchRequest(
vector=rest.NamedVector(
name=DENSE_VECTOR_NAME,
vector=query_embedding,
),
limit=query.similarity_top_k,
filter=query_filter,
with_payload=True,
),
],
)
Parameter to Check the Similarity Metric: There isn't a direct parameter to check which similarity metric is being used, but you can infer it from the configuration and the methods being used in the code.
For more details, you can refer to the relevant sections in the Qdrant codebase [1].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
Im observing that the responses from the vector stores are quite different, is this possible?
Could this be because of the similarity metric? How do i change the similarity metric in qdrant? And is there a parameter to check which similarity metirc is being used?