langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.48k stars 14.8k forks source link

Can't Specify Top-K retrieved Documents in Multimodal Retrievers using Invoke() #23158

Open benjamin-meinhardt opened 3 months ago

benjamin-meinhardt commented 3 months ago

Checked other resources

Example Code

# Check retrieval
query = "What are the EV / NTM and NTM rev growth for MongoDB, Cloudflare, and Datadog?"
docs = retriever_multi_vector_img.invoke(query, limit=6)

# We get 4 docs
len(docs)

Error Message and Stack Trace (if applicable)

No response

Description

retriever_multi_vector_img.invoke(query) no longer has a method to limit or increase the amount of docs returned and subsequently passed to the LLM. This is defaulted at 4 and no information can be found on the issue.

You can see the incorrect use in this cookbook: https://github.com/langchain-ai/langchain/blob/master/cookbook/Multi_modal_RAG.ipynb

# Check retrieval
query = "What are the EV / NTM and NTM rev growth for MongoDB, Cloudflare, and Datadog?"
docs = retriever_multi_vector_img.invoke(query, limit=6)

# We get 4 docs
len(docs)

Where 6 was the limit and 4 is returned. How can we enforce more docs to be returned?

System Info

Using langchain_core 0.2.8

WithFoxSquirrel commented 2 months ago

try following, get the top k result retriever_multi_vector_img = vectorstore.as_retriever( search_type="mmr", search_kwargs={'k': 3, 'fetch_k': 5} )

then, retriever_multi_vector_img.invoke(question)