langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.34k stars 14.76k forks source link

can't seem to add filters in ConversationalRetrievalChain #4572

Closed ruturajgh closed 1 year ago

ruturajgh commented 1 year ago

Issue you'd like to raise.

i have a chromadb store that contains 3 to 4 pdfs stored, and i need to search the database for documents with metadata by the filter={'source':'PDFname'}, so it doesnt return with different docs containing similar data, the same is done with using similaritysearch() without any problems,

chain = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0),
                                            docsearch.as_retriever(),
                                              memory=memory)
print(chain({'question':query}))

but i dont understand how to, when trying to use filters with ConversationalRetrievalchain, i have tried doing

docsearch.as_retriever(kwargs={'filter':{'source':'pdfname'}), but it doesnt seem to work

i saw something

retriever = vector_store.as_retriever()
retriever.search_kwargs = {'k':1}

but it doesnt seem to recognise the [dot]search_kwargs any help would be appreciated

Suggestion:

No response

imeckr commented 1 year ago

Try this

chain = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0),
    docsearch.as_retriever(search_kwargs={'filter': {'source':'pdfname'}}),
    memory=memory
)
print(chain({'question':query}))
ruturajgh commented 1 year ago

hey thanks, your solution works,

but i made some changes with the conversational chain imports and passed the filter arguments along with the inputs object, incase i have to search between mutiple pdfs at once, this just propagates to the similarity search function and passes the filters and return docs from given pdfnames

` filter = [ {'source':'pdf_name'}, {'source':'pdf_name2} ]

print(chain ({"question" : question , 'filter' : filter}) `

its not as clean but it does the job

ruturajgh commented 1 year ago

Try this

chain = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0),
    docsearch.as_retriever(search_kwargs={'filter': {'source':'pdfname'}}),
    memory=memory
)
print(chain({'question':query}))

hey thanks, your solution works,

but i made some changes with the conversational chain imports and passed the filter arguments along with the inputs object, incase i have to search between mutiple pdfs at once, this just propagates to the similarity search function and passes the filters and return docs from given pdfnames

 filter = [ {'source':'pdf_name'}, {'source':'pdf_name2} ]

print(chain ({"question" : question , 'filter' : filter}) 

its not as clean but it does the job

ItsJustSmellz commented 1 year ago

Try this

chain = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0),
    docsearch.as_retriever(search_kwargs={'filter': {'source':'pdfname'}}),
    memory=memory
)
print(chain({'question':query}))

hey thanks, your solution works,

but i made some changes with the conversational chain imports and passed the filter arguments along with the inputs object, incase i have to search between mutiple pdfs at once, this just propagates to the similarity search function and passes the filters and return docs from given pdfnames

 filter = [ {'source':'pdf_name'}, {'source':'pdf_name2} ]

print(chain ({"question" : question , 'filter' : filter}) 

its not as clean but it does the job

hi there, i think im trying to do something similar here. could you explain more on how you achieved multi doc filtering?

ItsJustSmellz commented 1 year ago

what do you pass in for 'source'?

javedafroz commented 1 year ago

I am using below chain. I want to use multiple categories in the filters. My logic is to bring the results from category=c1 OR category=c2 OR category=c3. How can we modify below code the achieve the objective

chain = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), retriever= vectorstore.as_retriever(search_kwargs={'filter': {'category':category}}), memory=memory, return_source_documents = True)