langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.86k stars 14.88k forks source link

Add "similarity_score_threshold" option for MultiVectorRetriever class #23387

Open VomV opened 3 months ago

VomV commented 3 months ago

Proposal for a new feature below by @baptiste-pasquier

Checked

Feature request

Add the ability to filter out documents with a similarity score less than a score_threshold in the MultiVectorRetriever.

Motivation

The VectorStoreRetriever base class has a "similarity_score_threshold" option for search_type, which adds the ability to filter out any documents with a similarity score less than a score_threshold by calling the .similarity_search_with_relevance_scores() method instead of .similarity_search().

This feature is not implementend in the MultiVectorRetriever class.

Proposal (If applicable)

In the _get_relevant_documents method of MultiVectorRetriever

Replace :

https://github.com/langchain-ai/langchain/blob/b20c2640dac79551685b8aba095ebc6125df928c/libs/langchain/langchain/retrievers/multi_vector.py#L63-L68

With :

if self.search_type == "similarity":
    sub_docs = self.vectorstore.similarity_search(query, **self.search_kwargs)
elif self.search_type == "similarity_score_threshold":
    sub_docs_and_similarities = (
        self.vectorstore.similarity_search_with_relevance_scores(
            query, **self.search_kwargs
        )
    )
    sub_docs = [sub_doc for sub_doc, _ in sub_docs_and_similarities]
elif self.search_type == "mmr":
    sub_docs = self.vectorstore.max_marginal_relevance_search(
        query, **self.search_kwargs
    )
else:
    raise ValueError(f"search_type of {self.search_type} not allowed.")

As in the VectorStoreRetriever base class :

https://github.com/langchain-ai/langchain/blob/b20c2640dac79551685b8aba095ebc6125df928c/libs/core/langchain_core/vectorstores.py#L673-L687

Originally posted by @baptiste-pasquier in https://github.com/langchain-ai/langchain/discussions/19404

aperepel commented 3 months ago

Also add the new enum value to the MultiVectorRetriever.SearchType.

wenngong commented 3 months ago

Submit a PR: #23539 for this issue