Issue: Can we limit the number of relevant documents returned by AzureCognitiveSearchRetriever?

langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

https://python.langchain.com

MIT License

94.46k stars 15.27k forks source link

Issue: Can we limit the number of relevant documents returned by AzureCognitiveSearchRetriever? #5081

Closed matteo-campana closed 1 year ago

matteo-campana commented 1 year ago

I need to limit the number of documents that AzureCognitiveSearchRetriever returns so that I can aggregate only the most relevant documents. Is there a way to do this with the current functionality or do we need to implement it?

UmerHA commented 1 year ago

Hey, I wrote AzureCognitiveSearchRetriever. 👋🏼

Can you post your full code where you would need it? Is retriever.get_relevant_documents("what is langchain")[:n] not sufficient for your use case?

magallardo commented 1 year ago

@UmerHA Do you have a sample on using the retriever? I have tried calling the retriever but getting connection errors.

Also, what is the conten_key parameter?

retriever = AzureCognitiveSearchRetriever(content_key="content")

Thanks, Marcelo

magallardo commented 1 year ago

@UmerHA I was able to figure it out the parameter content_Key, however, I wanted to know if there is anyway to return the whole result so I can get the keys I need. I my case, each result has 5 different keys. Thanks

magallardo commented 1 year ago

@UmerHA Please ignore as I was able to figure it out. Retriever is working great!!! Thanks

harryaultdev commented 1 year ago

@UmerHA Is slicing the only way to handle limiting search results? Can we not push this back to cognitive search to do a top N?

I'm trying to use RetrievalQA, my retriever in this case "AzureCognitiveSearchRetriever" if I do a generic query it's going to return a ton of documents, is there no way to limit this on the Retriever instance?

qa = RetrievalQA.from_chain_type( llm=open_ai, chain_type="stuff", retriever= AzureCognitiveSearchRetriever(xxxx), chain_type_kwargs=chain_type_kwargs, verbose=True, )

reeshabh90 commented 1 year ago

@UmerHA @hwchase17 Currently AzureCognitiveSearchRetriever retrieves all the documents, but in case I want to limit only 5 relevant documents and then use RetrievalQA, is it possible? As currently, I get token limit breach. Following is my code.


from langchain.retrievers import AzureCognitiveSearchRetriever
from langchain.chains import RetrievalQA

retreiver = AzureCognitiveSearchRetriever(content_key="content", index_name=AZURE_SEARCH_INDEX, api_key=search_key, service_name=AZURE_SEARCH_SERVICE)
docs = retreiver.get_relevant_documents(user_input)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retreiver,
    return_source_documents=True,
    chain_type="refine"
)
result = qa_chain({"query": user_input})

I get the following error: "This model's maximum context length is 4097 tokens, however you requested 6212 tokens"

Reason: Currently retriever tries to perform refine over all documents, but I want it to refine over only Top 5 relevant docuements, so that token limit is not crossed

UmerHA commented 1 year ago

@harryaultdev @reeshabh90 Added that option - see #7690. You can use a top_n parameter to limit the number of documents returned

from langchain.retrievers import AzureCognitiveSearchRetriever
from langchain.chains import RetrievalQA

retreiver = AzureCognitiveSearchRetriever(<all the other stuff>, top_k=10)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retreiver,
    return_source_documents=True,
    chain_type="refine"
)
result = qa_chain({"query": user_input})

reeshabh90 commented 1 year ago

Hi @UmerHA I also created a PR: 7692

Kindly review

reeshabh90 commented 1 year ago

Hey @UmerHA , Apologies, saw the PR you have already created. Awesome!

UmerHA commented 1 year ago

Hey @UmerHA , Apologies, saw the PR you have already created. Awesome!

No worries! Diving in and solving your problems yourself is exactly the right attitude. Keep it!

reeshabh90 commented 1 year ago

@baskaryan @UmerHA Do we need to upgrade LangChain version via pip?

UmerHA commented 1 year ago

@baskaryan @UmerHA Do we need to upgrade LangChain version via pip?

Yes, pip install -U langchain, where the U stands for upgrade

reeshabh90 commented 1 year ago

I still get following error: 1 validation error for AzureCognitiveSearchRetriever top_n extra fields not permitted (type=value_error.extra)

UmerHA commented 1 year ago

I still get following error: 1 validation error for AzureCognitiveSearchRetriever top_n extra fields not permitted (type=value_error.extra)

Sry, its top_k now, not top_n, so it's retreiver = AzureCognitiveSearchRetriever(<all the other stuff>, top_k=10)

reeshabh90 commented 1 year ago

Okay, however, even after fresh installation of langchain, this does not show up in code. Is the new change merged in main code?

datarootsian101 commented 1 year ago

Okay, however, even after fresh installation of langchain, this does not show up in code. Is the new change merged in main code?

Just checked the master branch, that update is there. But it is not released, I think. Could be the reason that pip update doesn't serve our purposes?