langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.96k stars 15.38k forks source link

Filter conditions are discarded when using multiple filter conditions in similarity_search_with_relevance_scores #15417

Closed Lue-C closed 10 months ago

Lue-C commented 10 months ago

System Info

Windows 10 Python 3.11.5 langchain==0.0.331

Who can help?

No response

Information

Related Components

Reproduction

I want to do a similarity search on a vectorstore created beforehand and use multiple filtering conditions. The relevant part of the code is as follows:

from langchain.vectorstores import Chroma
from langchain.embeddings import SentenceTransformerEmbeddings
import chromadb

db_path = "my_db"

embeddings = SentenceTransformerEmbeddings(cache_folder='intfloat/multilingual-e5-large')
chroma_client = chromadb.PersistentClient(path=db_path)

db= Chroma(persist_directory=db_path, embedding_function=embeddings, client=chroma_client)

query = "My query"

filtered = db.similarity_search_with_relevance_scores(k=5, query=query, filter={"key1":value1, "key2":value2})

When using one filtering condition like

filtered = db.similarity_search_with_relevance_scores(k=5, query=query, filter={"key1":value1})

the filtering condtion is applied and it works fine. But when using multiple conditions, neither of the conditions is applied.

How can I use multiple filtering conditions?

Expected behavior

I would expect, that both conditions are applied either connected by an "and" or an "or".

dosubot[bot] commented 10 months ago

🤖

Hello @Lue-C! 👋 I'm Dosu, a friendly bot here to assist you while we wait for a human maintainer. I can help solve bugs, answer questions, and guide you on becoming a contributor. Don't hesitate to ask if you need anything, I'm here to help!

Based on the information you provided and the context from the LangChain repository, it seems that the filter parameter in the similarity_search_with_relevance_scores method of the Chroma class in LangChain's framework is designed to handle a single filter condition. When you pass multiple conditions in the filter parameter, it might not work as expected.

However, there is a workaround to apply multiple filter conditions. You can use the $and or $or operators in the filter parameter to apply multiple conditions. Here's how you can do it:

filtered = db.similarity_search_with_relevance_scores(k=5, query=query, filter={'$and': [{'key1': {'$eq': value1}}, {'key2': {'$eq': value2}}]})

This will return documents that match both key1 and key2. If you want to return documents that match either key1 or key2, you can use the $or operator:

filtered = db.similarity_search_with_relevance_scores(k=5, query=query, filter={'$or': [{'key1': {'$eq': value1}}, {'key2': {'$eq': value2}}]})

Please note that the $eq operator is used to check for equality. If you want to check if a value is in a list of values, you can use the $in operator:

filtered = db.similarity_search_with_relevance_scores(k=5, query=query, filter={'key1': {'$in': [value1, value2]}})

This will return documents where key1 is either value1 or value2.

Please try this solution and let me know if it works for you. If you have any other questions or need further clarification, feel free to ask.

Best, Dosu

Sources

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

sree1658 commented 3 months ago

@ dosubot can you help me with a query for 3 keys - 2keys with "and" and one key with "or"