Langchain RetrievalQAWithSourcesChain throwing ValueError: Missing some input keys: {'context'}

mertzamir commented 1 month ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[ ] I am sure that this is a bug in LangChain rather than my code.
[ ] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

python
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    PromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain_openai import ChatOpenAI
from langchain_community.vectorstores import Redis

from chatbot_api import config

_INDEX_NAME = "Postmarket"

rds = Redis.from_existing_index(
    embedding=config.OPEN_AI_EMBEDDINGS,
    index_name=_INDEX_NAME,
    schema=config.INDEX_SCHEMA,
    redis_url=config.REDIS_URL,
)

_template = """Your job is to use information on the documents
to answer questions about postmarket operations. Use the following
context to answer questions. Be as detailed as possible, but don't
make up any information that's not from the context. If you don't
know an answer, say you don't know. If you refer to a document, cite
your reference.
{context}
"""

system_prompt = SystemMessagePromptTemplate(
    prompt=PromptTemplate(input_variables=['context'], template=_template)
)

human_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(input_variables=['question'], template="{question}")
)
messages = [system_prompt, human_prompt]

postmarket_prompt = ChatPromptTemplate(input_variables=['context', 'question'], messages=messages)

postmarket_chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=ChatOpenAI(model=config.QA_MODEL, temperature=config.TEMPERATURE),
    chain_type="stuff",
    retriever=rds.as_retriever(search_type="similarity", search_kwargs={"k": 8}),
    return_source_documents=True,
    # chain_type_kwargs={"prompt": postmarket_prompt}, # this also doesn't work throwing ValueError -> document_variable_name summaries was not found in llm_chain input_variables: ['context', 'question']
    verbose=True,
)
postmarket_chain.combine_documents_chain.llm_chain.prompt = postmarket_prompt

Then the postmarket_chain is used by the tool i defined in my langchain agent as func=postmarket_chain.invoke

Error Message and Stack Trace (if applicable)

[chain/start] [chain:AgentExecutor > tool:Premarket > chain:RetrievalQAWithSourcesChain] Entering Chain run with input:
{
  "question": "What are the procedures for submitting an application for a new medical device?",
  "history": []
}
[chain/start] [chain:AgentExecutor > tool:Premarket > chain:RetrievalQAWithSourcesChain > chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [chain:AgentExecutor > tool:Premarket > chain:RetrievalQAWithSourcesChain > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
{
  "question": "What are the procedures for submitting an application for a new medical device?",
  "summaries": "Content: Page 12D. Promotional Literature\nAny (I'm cutting the rest but this text is fetched from my vectorstore, I can confirm)" 
}
[llm/start] [chain:AgentExecutor > tool:Premarket > chain:RetrievalQAWithSourcesChain > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOpenAI] Entering LLM run with input:
{
  "prompts": [
    "System: Your job is to use information on documents\nto answer questions about premarket operations. Use the following\ncontext to answer questions. Be as detailed as possible, but don't\nmake up any information that's not from the context. If you don't\nknow an answer, say you don't know. If you refer to a document, cite\nyour reference.\n{context}\n\nHuman: What are the procedures for submitting an application for a new medical device?"
  ]
}
[llm/end] [chain:AgentExecutor > tool:Premarket > chain:RetrievalQAWithSourcesChain > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOpenAI] [5.16s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "I don't have the specific documents or guidelines available in the provided context to detail the procedures for submitting a 510(k) notification for a new medical device. Typically, this process involves preparing and submitting a premarket notification to the FDA to demonstrate that the new device is substantially equivalent to a legally marketed device (predicate device) not subject to premarket approval (PMA). The submission includes information about the device, its intended use, and comparative analyses, among other data. For detailed steps and requirements, it is best to refer directly to the relevant FDA guidelines or documents.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],

Description

I have a multimodel RAG system that generates answers using the texts parsed from hundreds of PDFs that are retrieved from my Redis vectorstore. And I have several chains (RetrievalQAWithSourcesChain) to find relevant contextual texts from vectorstore and append them in my chatbot llm calls. I'm having problems in correctly adding context to the system prompt. Below code throws ValueError: Missing some input keys: {'context'} .

The RetrievalQAWithSourcesChain is supposed to use the Redis retriever and append the extracted texts to the {context} I believe, but seems like it can't or there's something else i can't see.

It surprisinly works when I use double brackets around 'context' in the prompt -> {{context}}. However, when I examine the logs of the intermediate steps of langchain trying to use the agent's tools to generate an answer, my understanding is that the context is not even passed and the llm model just uses its own knowledge to give answers without using any contextual info that's supposed to be passed from vectorstore. Here are some logs below. Notice how some text data returned from vectorstore is included in summaries but then when StuffDocumentsChain passed that to llm:ChatOpenAI you see that it's not injected into the system prompt (scroll right to see), the context field still remains as {context} (it dropped the outer brackets)

Am I right in my assumption of the context is not being passed to the knowledge window correctly? How can I fix this? All the examples I see from other projects use one bracket around context when they include it in the system prompt. However I could only make the code work with double brackets and that seems like it's not injecting the context at all...

Can this be due to the index schema I used when creating the vectorstore? the schema for reference:

text:
- name: content
- name: source
numeric:
- name: start_index
- name: page
vector:
- name: content_vector
  algorithm: HNSW
  datatype: FLOAT32
  dims: 384
  distance_metric: COSINE

System Info

langchain==0.2.7 langchain-community==0.2.7 langchain-core==0.2.16 langchain-openai==0.1.15 langchain-text-splitters==0.2.2 langchainhub==0.1.20

Python 3.12.4

OS: MacOS Sonoma 14.4.1

havkerboi123 commented 1 month ago

Can I work on this?

mertzamir commented 1 month ago

What do you mean? 😅

keenborder786 commented 1 month ago

@havkerboi123 you are not passing the correct input variable for the document variable in your prompt. Just do this and it should work.


postmarket_chain = RetrievalQAWithSourcesChain.from_chain_type(
   llm=ChatOpenAI(model=config.QA_MODEL, temperature=config.TEMPERATURE),
    chain_type="stuff",
    retriever=rds.as_retriever(search_type="similarity", search_kwargs={"k": 8}),
    return_source_documents=True,
    verbose=True,
    chain_type_kwargs = {'document_variable_name':'context','prompt':postmarket_prompt}
)

mertzamir commented 3 weeks ago

@keenborder786 thanks for your answer, that worked for me! one last question. Would adding the prompt field to chain_type_kwargs like you mentioned in your answer can make the next line obsolete? (next line for your reference:

postmarket_chain.combine_documents_chain.llm_chain.prompt = postmarket_prompt

langchain-ai / langchain