Closed levalencia closed 7 months ago
🤖
Hello @levalencia! It's great to see you again. I hope you've been well. Let's dive into this issue you're experiencing with the LangChain framework.
Based on the information you've provided, it seems like the filters
parameter is not being applied when using the AzureChatOpenAI
with the RetrievalQA
chain. This could be due to a number of reasons, and I'll try to provide some potential solutions based on similar issues that have been solved in the past.
One possible solution is to extend the VectorStoreRetriever
class and override the get_relevant_documents
method to apply the filter manually. Here is an example of how you can do this:
from langchain_core.vectorstores import VectorStoreRetriever
class FilteredRetriever(VectorStoreRetriever):
def get_relevant_documents(self, query: str) -> List[Document]:
results = super().get_relevant_documents(query=query)
return [doc for doc in results if doc.metadata['title'] == '25_1_0.pdf']
retriever = FilteredRetriever(vectorstore=vector_store)
Then, you can use this FilteredRetriever
in your RetrievalQA
chain:
chain = RetrievalQA.from_chain_type(llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True)
This solution was suggested in a similar issue: Filtering retrieval with ConversationalRetrievalChain.
Another possible solution is to ensure that the filters
parameter is being passed correctly to the similarity_search
method. In the AzureSearch
class, the filters
parameter is passed to the self.client.search
method of the Azure Search client. This allows the search to be filtered according to the conditions specified in the filters
parameter. You can check the source code for this in the libs/langchain/langchain/vectorstores/azuresearch.py file.
If these solutions don't work, it would be helpful to know more about the error or unexpected behavior you're experiencing. For example, are you receiving any error messages? Or are the search results simply not what you're expecting? Also, it would be helpful to know the exact version of LangChain you're using, as well as the version of the Azure Search SDK. This information could help identify if there are any known issues or changes in the recent versions that might be causing this behavior.
I hope this helps! If you have any other questions or need further clarification, please let me know.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Hey, I am also looking for the similar solution, Did you find any solutions around this?
@Gauravmahapatrocdm, I was able to have the filter working by passing it as a search keyword arguments.
In the above example, I got it working by using the following:
retriever = vector_store.as_retriever(search_kwargs={"k":3, filters="title eq '25_1_0.pdf'"})
retriever = vector_store.as_retriever(search_kwargs={"k":3, filters="title eq '25_1_0.pdf'"}) @lkartawidjaja How did this work if the kwargs only accepts dict format and filters=... is not in dict format?
@HeroAntagonist. My bad it should be:
retriever = vector_store.as_retriever(search_kwargs={"k":3, “filters”: “title eq '25_1_0.pdf'"})
System Info
langchain 0.0.342 langchain-core 0.0.7 azure-search-documents 11.4.0b8
Python: 3.10
Who can help?
@hw
Information
Related Components
Reproduction
The following code works fine:
However when using it with an LLM:
My output has chunks which dont respect the filter:
142_2_0.pdf 99_9_0.docx 99_9_0.docx 142_2_0.pdf
Expected behavior
The answer generated with source_documents, should contain chunks which respects the given filters.