return_source_documents does not work for ConversationalRetrievalChain

Teekyy commented 5 months ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.

Example Code

The following code initializes the chatbot instance using ConversationalRetrievalChain with the 'return_source_documents' parameter:

def initialize_chatbot(index_name):
    chatbot = ChatOpenAI(
        openai_api_key=os.environ["OPENAI_API_KEY"],
        model='gpt-3.5-turbo',
        temperature=0.2
    )

    embeddings = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
    vectorstore = Pinecone.from_existing_index(index_name, embeddings)
    retriever = vectorstore.as_retriever()

    memory = ConversationBufferWindowMemory(
        k=10,
        memory_key="chat_history",
        return_messages=True
    )

    qa = ConversationalRetrievalChain.from_llm(
        llm=chatbot, 
        retriever=retriever, 
        memory=memory,
        return_source_documents=True
    )

    return qa

The following code runs a query:

def chat(query, qa):
    response = qa(query)

    print(response)

query = "what is the nutrition information for the boiled egg recipe?"
chat(query, chatbot)

The error I get:

File [langchain/chains/base.py:314], in Chain.__call__(self, inputs, return_only_outputs, callbacks, tags, metadata, run_name, include_run_info)
    [312]     raise e
    [313]run_manager.on_chain_end(outputs)
--> [314] final_outputs: Dict[str, Any] = self.prep_outputs(
    [315]    inputs, outputs, return_only_outputs
    [316] )
    [317] if include_run_info:
    [318]     final_outputs[RUN_KEY] = RunInfo(run_id=run_manager.run_id)

File [langchain/chains/base.py:410], in Chain.prep_outputs(self, inputs, outputs, return_only_outputs)
    [408] self._validate_outputs(outputs)
    [409] if self.memory is not None:
--> [410]     self.memory.save_context(inputs, outputs)
    [411] if return_only_outputs:
...
---> [29]         raise ValueError(f"One output key expected, got {outputs.keys()}")
     [30]     output_key = list(outputs.keys())[0]
     [31] else:

ValueError: One output key expected, got dict_keys(['answer', 'source_documents'])

Description

I am trying to use the langchain library to return source documents using the ConversationalRetrievalChain. However I keep getting an error relating to it expecting only one output key. I looked into the code and found I think it is executing the __call__ function, which was deprecated in langchain version 0.1.0, and it says it only expects one output key. I am using the most recent langchain version that pip allows (pip install --upgrade langchain), which is 0.1.1. How can I get this to execute properly?

Additional notes:

I am using langchain-openai for ChatOpenAI and OpenAIEmbeddings

System Info

"pip install --upgrade langchain" Python 3.11.5 Langchain 1.1.0

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[X] Chains
[ ] Callbacks/Tracing
[ ] Async

dosubot[bot] commented 5 months ago

🤖

Hey @Teekyy! Good to see you back here. I hope you've been having fun coding! Let's dive into this issue you're having.

Based on the error message you provided, it seems that the issue is related to the output_keys property of the BaseConversationalRetrievalChain class. This property returns a list of keys that will be present in the output dictionary of the chain. By default, it includes the output_key (which is "answer" by default), and may also include "source_documents" and "generated_question" if the return_source_documents and return_generated_question attributes are set to True, respectively.

The function you're using to process the output of the chain is expecting only one output key, but you have return_source_documents set to True, then the function will receive two keys: "answer" and "source_documents", which is causing the error.

To resolve this, you can set return_source_documents to False when initializing your ConversationalRetrievalChain. Here's an example:

qa = ConversationalRetrievalChain.from_llm(
    llm=chatbot, 
    retriever=retriever, 
    memory=memory,
    return_source_documents=False
)

This will ensure that the output of the chain only includes the "answer" key, which should resolve the error. If you still need to access the source documents, you may need to modify the function that's processing the output to handle multiple keys.

You can find more information about this in the LangChain codebase.

Please let me know if this resolves your issue or if you need further assistance.

Sources

libs/langchain/langchain/chains/conversational_retrieval/base.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Teekyy commented 5 months ago

Obviously in this case I want the source documents, so making that parameter false would not be ideal.

Teekyy commented 5 months ago

Upon further investigation, if you remove the memory parameter, and instead change the code to run a query to response = qa({"question": query, "chat_history": ""}), it seems to work. Seems like its messing with the memory component, but not sure why.

Teekyy commented 5 months ago

Solved by this issue: #6741

langchain-ai / langchain