langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.04k stars 14.65k forks source link

ConversationalRetrievalChain + Memory #2303

Open da-bu opened 1 year ago

da-bu commented 1 year ago

Hi,

I'm following the Chat index examples and was surprised that the history is not a Memory object but just an array. However, it is possible to pass a memory object to the constructor, if

  1. I also set memory_key to 'chat_history' (default key names are different between ConversationBufferMemory and ConversationalRetrievalChain)
  2. I also adjust get_chat_history to pass through the history from the memory, i.e. lambda h : h.

This is what that looks like:

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=False)
conv_qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=retriever, 
    memory=memory,
    get_chat_history=lambda h : h)

Now, my issue is that if I also want to return sources that doesn't work with the memory - i.e. this does not work:

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=False)
conv_qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=retriever, 
    memory=memory,
    get_chat_history=lambda h : h,
    return_source_documents=True)

The error message is "ValueError: One output key expected, got dict_keys(['answer', 'source_documents'])".

Maybe I'm doing something wrong? If not, this seems worth fixing to me - or, more generally, make memory and the ConversationalRetrievalChain more directily compatible?

talhaanwarch commented 1 year ago

Ok, in that case you need to persist memory some where. Either in cache, or may be in database. Look in my project, i am saving it in sqlite. https://github.com/talhaanwarch/doc_chat_api langchain also provide different options such as redis, postgres etc https://python.langchain.com/en/latest/modules/memory/how_to_guides.html

I suggest you to set verbose=True, to understand process better

samthedataman commented 1 year ago

I am doing verbose = True and its simply not behaving as it should, the chat history does not persist between chats for some reason, and you suggesting to save it in sql light is not feasible. I wonder why this behavior is existing

talhaanwarch commented 1 year ago

You have to persist it, by default its not persistent

On Tue, 13 Jun 2023, 8:46 am Sam Savage, @.***> wrote:

I am doing verbose = True and its simply not behaving as it should, the chat history does not persist between chats for some reason, and you suggesting to save it in sql light is not feasible. I wonder why this behavior is existing

— Reply to this email directly, view it on GitHub https://github.com/hwchase17/langchain/issues/2303#issuecomment-1588478624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI5FYO3PA5OFU4IC7OZNVV3XK7PATANCNFSM6AAAAAAWQLQUF4 . You are receiving this because you were mentioned.Message ID: @.***>

samthedataman commented 1 year ago

You have to persist it, by default its not persistent On Tue, 13 Jun 2023, 8:46 am Sam Savage, @.> wrote: I am doing verbose = True and its simply not behaving as it should, the chat history does not persist between chats for some reason, and you suggesting to save it in sql light is not feasible. I wonder why this behavior is existing — Reply to this email directly, view it on GitHub <#2303 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI5FYO3PA5OFU4IC7OZNVV3XK7PATANCNFSM6AAAAAAWQLQUF4 . You are receiving this because you were mentioned.Message ID: @.>

I ended up combine your techniques with @esgdao techniques and its worked for my use case now! off to refine the chain this was wild to setup btw


@st.cache_resource
def init_memory():
    return ConversationSummaryBufferMemory(
        llm=ChatOpenAI(temperature=0.1),
        output_key='answer',
        memory_key='chat_history',
        return_messages=True)

def retreive_best_answer(full_user_question: str):
    openai.api_key = os.getenv("OPEN_API_KEY")
    embeddings = OpenAIEmbeddings()
    llm = ChatOpenAI(temperature=0.1)
    vectordb = FAISS.load_local("merged_faiss_index", embeddings)

    prompt_template_doc = """
        Use chat history : {chat_history} to determine the condition you are to research if not blank

        Use the following pieces of context to answer the question at the end.
        {context}
        If you still cant find the answer, just say that you don't know, don't try to make up an answer.
        You can also look into chat history.
        {chat_history}
        Question: {question}
        Answer:
        """

    prompt_doc = PromptTemplate(
        template=prompt_template_doc,
        input_variables=["context", "question", "chat_history"],
    )

    qa = ConversationalRetrievalChain.from_llm(
        ChatOpenAI(temperature=0.1),
        vectordb.as_retriever(),
        memory=init_memory())

    results = qa({"question": full_user_question})

    return results["answer"], results["chat_history"]
cheevahagadog commented 1 year ago

Thanks to the commenters who've worked on this issue. I wanted to create a chatbot with these four things:

  1. Prompt: The ability to customize the tone/feel of the bot with a system prompt
  2. DocSearch: This is the chatting over an index ability
  3. Memory: The bot should retain previous messages and be able to think about them
  4. Citations: The model should be able to bring in references to the answers it gives

After many iterations and some help on this thread (shout out to @esgdao for the tip about setting the prompt) I have a chatbot that can do all these requirements. Here's my setup: Note: I'm using langchain-0.0.200

# Imports
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import SystemMessagePromptTemplate
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

# Load the VectorDB
embeddings = OpenAIEmbeddings()
vectordb = Chroma(
    collection_name='<<collection_name>>', 
    persist_directory=".chromadb/", 
    embedding_function=embeddings
)

# Create the multipurpose chain
qachat = ConversationalRetrievalChain.from_llm(
    llm=ChatOpenAI(temperature=0), 
    retriever=vectordb.as_retriever(),  # ☜ DOCSEARCH
    return_source_documents=True        # ☜ CITATIONS
)

# PROMPT 👇
sys_prompt = "Act as a friendly and helpful customer support rep. 
Answer questions about <<company's>> products and services."
qachat.combine_docs_chain.llm_chain.prompt.messages[0] = SystemMessagePromptTemplate.from_template(sys_prompt)

# MEMORY 👇
chat_history = []

## Question 1
query = "Hi there, how are you?"
result = qachat({"question": query, "chat_history": chat_history})
print(result['answer'])
# "Hello! I'm an AI language model, so I don't have feelings, but I'm here to assist you with any questions you may have about <<company's>> products and services. How can I help you today?"

## Question 2, updating the chat_history object
chat_history = [(query, result["answer"])]
query = "Can you tell me where does X report pull from to get the Patient Confidential field?"
result = qachat({"question": query, "chat_history": chat_history})
print(result['answer'])
# "The X report pulls the Patient Confidential field from the patient\'s chart in <<Product>>. This field can be found under the "Patient Information" section of t...'"

## Question 3, testing the memory is working
chat_history = [(query, result["answer"])]
query = "Do I need a specific role to access that?"  # ☜ Testing for memory
result = qachat({"question": query, "chat_history": chat_history})
print(result['answer'])
# "To access the Patient Confidential field in the patient\'s chart in <<Product>>, you need to have the "Patient Confidential" permission enabled in your user role. This permission is typically granted to users who..."

After multiple attempts and combinations this was the only setup that worked for me. Hopefully it gets worked out in future versions.

robertocommit commented 1 year ago

@cheevahagadog thanks so much for this.

I am using from langchain.chains import RetrievalQA

and the way I adapt your code to work with it is the following:

chain.combine_documents_chain.llm_chain.prompt.messages[0] = SystemMessagePromptTemplate.from_template(sys_prompt)

Now I am wondering, is there no a more "elegant" way to declare an initial prompt?

It seems a kind of hack done it in such a way

Thanks again

dcellison commented 1 year ago

@robertocommit There is a way that's not a hack, actually. If you use ConversationalRetrievalChain.from_llm you can provide this as a parameter:

combine_docs_chain_kwargs={'prompt': prompt}

Not sure if this works with RetrievalQA, though.

dcellison commented 1 year ago

@robertocommit It looks like BaseRetrievalQA.from_llm() (and hence RetrievalQA) accepts a prompt argument, so you should be able to supply a custom prompt there.

zakcroft commented 1 year ago

This worked for me. You can bypass the function with get_chat_history=lambda h:h, and this just returns the str.

Like in ConversationalRetrievalChain

memory=ConversationBufferMemory(memory_key="chat_history")
chat = ConversationalRetrievalChain.from_llm(
    memory=ConversationBufferMemory(memory_key="chat_history"),
    get_chat_history=lambda h:h,
    ...
)

However if you want a list of messages from memory to pass through to get_chat_history then add return_messages=True to ConversationBufferMemory. Like

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

It then passes the correct format to get_chat_history.

jeet129 commented 1 year ago

I would like to contribute to a fix for this. I am unable to find any option to assign this issue to myself, can someone help.

mrctito commented 1 year ago

Dear friends,

I have been having problems for several days and would greatly appreciate some help.

First of all, my Python version is 3.11.4, and my LangChain version is 0.0.224.

I am using ConversationalRetrievalChain with ConversationBufferWindowMemory (or ConversationBufferMemory), but I have tried several combinations, and:

  1. It either doesn't work.
  2. It repeats my question in its response.
  3. It completely duplicates the response.

One thing I noticed is that when I tried to use a suggestion from this forum, which was to use qa_prompt=QA_PROMPT, it gave an error saying that this parameter does not exist.

The CONDENSE PROMPT asks it to rephrase the question, and it seems to be the cause of it repeating my reformulated question.

Has anyone here been able to make this work recently?

Is there an alternative to ConversationalRetrievalChain without the condense_prompt?

Thank you very much!

diyarfaraj commented 1 year ago

chain({ "question": "What was the last question I asked you.", "chat_history": history }, return_only_outputs=True)

where did you get that 'history' variable?

MarkEdmondson1234 commented 1 year ago

edit: fixed now

def qna(question: str, vector_name: str, chat_history=[]):

    logging.debug("Calling qna")

    llm, embeddings, llm_chat = pick_llm(vector_name)

    vectorstore = pick_vectorstore(vector_name, embeddings=embeddings)

    retriever = vectorstore.as_retriever(search_kwargs=dict(k=3))

    prompt = pick_prompt(vector_name)

    logging.basicConfig(level=logging.DEBUG)
    logging.debug(f"Chat history: {chat_history}")
    qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(model="gpt-4", temperature=0.2, max_tokens=5000),
                                               retriever=retriever, 
                                               return_source_documents=True,
                                               verbose=True,
                                               output_key='answer',
                                               combine_docs_chain_kwargs={'prompt': prompt},
                                               condense_question_llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0))

    try:
        result = qa({"question": question, "chat_history": chat_history})
    except Exception as err:
        error_message = traceback.format_exc()
        result = {"answer": f"An error occurred while asking: {question}: {str(err)} - {error_message}"}

    logging.basicConfig(level=logging.INFO)
    return result
yashugupta786 commented 1 year ago

How to handle such scenarios if the current asked questions is totally a new question and does not have relation with previous chat history . The standalone question would probably be non-sense, as it's semantic is twisted by the chat history.

`OPENAI_API_KEY=xxxxxx OPENAI_API_BASE=https://xxxxxxxx.openai.azure.com/ OPENAI_API_VERSION=2023-05-15 import os import openai from dotenv import load_dotenv from langchain.chat_models import AzureChatOpenAI from langchain.embeddings import OpenAIEmbeddings

Load environment variables (set OPENAI_API_KEY, OPENAI_API_BASE, and OPENAI_API_VERSION in .env) load_dotenv()

Configure OpenAI API openai.api_type = "azure" openai.api_base = os.getenv('OPENAI_API_BASE') openai.api_key = os.getenv("OPENAI_API_KEY") openai.api_version = os.getenv('OPENAI_API_VERSION')

Initialize gpt-35-turbo and our embedding model llm = AzureChatOpenAI(deployment_name="gpt-35-turbo") embeddings = OpenAIEmbeddings(deployment_id="text-embedding-ada-002", chunk_size=1) from langchain.document_loaders import DirectoryLoader from langchain.document_loaders import TextLoader from langchain.text_splitter import TokenTextSplitter

loader = DirectoryLoader('data/qna/', glob="*.txt", loader_cls=TextLoader, loader_kwargs={'autodetect_encoding': True})

documents = loader.load() text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents) from langchain.vectorstores import FAISS db = FAISS.from_documents(documents=docs, embedding=embeddings)

from langchain.chains import ConversationalRetrievalChain from langchain.prompts import PromptTemplate

Adapt if needed CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History: {chat_history} Follow Up Input: {question} Standalone question:""")

qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=db.as_retriever(), condense_question_prompt=CONDENSE_QUESTION_PROMPT, return_source_documents=True, verbose=False) `

hemanthkrishna1298 commented 1 year ago

Hi everyone, I have figured out a workaround for sending in the entire concatenated memory into a ConversationalRetrievalChain and bypassing the question condensing chain. This workaround builds on this answer.

First, create a no-op LLM chain that we will use as the question generator. This will directly pass the question to the combine_docs chain, bypassing the question condensation step:

from langchain import LLMChain, PromptTemplate
from langchain.chat_models import ChatOpenAI

class NoOpLLMChain(LLMChain):
   """No-op LLM chain."""

   def __init__(self):
       """Initialize."""
       super().__init__(llm=ChatOpenAI(), prompt=PromptTemplate(template="", input_variables=[]))

   def run(self, question: str, *args, **kwargs) -> str:
       return question

Instantiate a memory object with output_key='answer':

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key='chat_history', output_key='answer', return_messages=True)

Instantiate a convRQA chain using the from_llm method, and replace the default question_generator with our no-op chain:

from langchain.chains import ConversationalRetrievalChain

conv_rqa = ConversationalRetrievalChain.from_llm(llm=llm,
                                                 chain_type="stuff",
                                                 verbose="True",
                                                 memory = memory,
                                                 retriever=retriever,
                                                 return_source_documents = True)

no_op_chain = NoOpLLMChain()
conv_rqa.question_generator = no_op_chain

Now we will modify the default combine_docs_chain system message prompt to include the chat history at the end. You can also modify this prompt to tailor to your use case. We also need to add 'chat_history' as a variable to the ChatPromptTemplate object in combine_docs_chain.llm_chain.prompt:

from langchain.prompts.chat import SystemMessagePromptTemplate

modified_template = "Use the following pieces of context to answer the users question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n{context}\nChat History:\n{chat_history}"
system_message_prompt = SystemMessagePromptTemplate.from_template(modified_template)
conv_rqa.combine_docs_chain.llm_chain.prompt.messages[0] = system_message_prompt

# add chat_history as a variable to the llm_chain's ChatPromptTemplate object
conv_rqa.combine_docs_chain.llm_chain.prompt.input_variables = ['context', 'question', 'chat_history']

This is working successfully for me. Hope this helps!

clauslang commented 1 year ago

@hemanthkrishna1298 Thanks a lot, this works for me as well!

I'm running into the next challenge now: Since my index is pretty big and has a lot of different documents, when I ask a generic follow-up question like "Give more details on the previous answer", my (Pinecone) retriever retrieves a bunch of documents completely unrelated to the previous answers. It seems like the retriever is only taking into account the latest question, but not the chat history. (Still hallucinates more details, so answer sounds not too bad, but ideally should of course take a closer look at the documents from the previous answer(s).)

How are you dealing with that?

coreation commented 1 year ago

@clauslang I haven't gotten so far with testing yet, but I'm sure I'll bump into the same thing. Maybe it helps to see what the history property contains, which to my knowledge is used as a context to answer the question of the user. If the history only contain the latest answer, then that's the problem. So I would first verify your assumption, i.e. "the retriever is only talking into account the latest question (answer?)." and then go from there.

clauslang commented 1 year ago

So turns out the history property does contain the complete history, but that doesn't matter because ConversationalRetrievalChain simply doesn't pass the history (hidden in the inputs parameter) to the retriever when asking for the relevant documents:

class ConversationalRetrievalChain(BaseConversationalRetrievalChain):
    # ...
    def _get_docs(self, question: str, inputs: Dict[str, Any]) -> List[Document]:
        docs = self.retriever.get_relevant_documents(question)
        return self._reduce_tokens_below_limit(docs)
    # ...

(Might be from a previous langchain version, latest one here.)

My first hacky attempt at also taking into account the history seems to work ok:

class MyConversationalRetrievalChain(ConversationalRetrievalChain):
    def _get_docs(self, question: str, inputs: Dict[str, Any]) -> List[Document]:
        history = '\n\n'.join(['{}: "{}"'.format(message.type, message.content) for message in inputs['chat_history']])
        question_with_history = 'question: "{}"\n\nchat history:\n\n{}'.format(question, history)
        docs = self.retriever.get_relevant_documents(question_with_history)
        return self._reduce_tokens_below_limit(docs)
captainawesome78 commented 12 months ago

So I think the issue here is that the BaseChatMemory gets all confused when the output it receives contains more then one key, and it doesn't know which one to assign as the answer; it's in this code here:

if self.output_key is None:
            if len(outputs) != 1:
                raise ValueError(f"One output key expected, got {outputs.keys()}")
            output_key = list(outputs.keys())[0]
        else:

When you have return_source_documents=True, the output has two keys: answer and source_documents, and that causes this to throw an error.

The workaround that got this working for me was to specify answer as the output key when creating this ConversationBufferMemory object. Then it doesn't have to try to guess at what the output_key is.

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

This is gold!

theekshanamadumal commented 11 months ago

The main reason for these type issues is the inconsistency of output keys in Langchain chains.

LLMChain -> 'text' RetrievalQA -> {'question', 'result', 'source_documents'} ConversationalRetrievalChain -> {'question', 'answer', 'source_documents'}

If you are using memory with each chain type

 memory = ConversationBufferMemory(
           memory_key='chat_history', return_messages=True, output_key='answer'
 )

If you are using Langchain agents the output key is 'output'

you will need to modify one of

OutputParser

__get_inputoutput function in class BaseChatMemory

for example, ConversationalRetrievalChain with ZeroShotAgent

    def _get_input_output(
        self, inputs: Dict[str, Any], outputs: Dict[str, str]
    ) -> Tuple[str, str]:

        if self.input_key is None:
            prompt_input_key = get_prompt_input_key(inputs, self.memory_variables)
        else:
            prompt_input_key = self.input_key

        if self.output_key is None:
            """
            output for agent with LLM chain tool                     = {answer} 
            output for agent with ConversationalRetrievalChain tool  = {'question', 'chat_history', 'answer','source_documents'}
            """

            LLM_key = 'output'
            Retrieval_key = 'answer'
            if isinstance(outputs[LLM_key], dict):
                Retrieval_dict = outputs[LLM_key]
                if Retrieval_key in Retrieval_dict.keys():
                    #output keys are 'answer' , 'source_documents'
                    output = Retrieval_dict[Retrieval_key]
                else:
                    raise ValueError(f"output key: {LLM_key} not a valid dictionary")

            else:
                #otherwise output key will be 'output'
                output_key = list(outputs.keys())[0]
                output = outputs[output_key]

            # if len(outputs) != 1:
            #     raise ValueError(f"One output key expected, got {outputs.keys()}")

        else:
            output_key = self.output_key
            output = outputs[output_key]

        return inputs[prompt_input_key], output
pranavi-shekhar commented 11 months ago

Not sure if this is still relevant, but I found that the best way to obtain chat history with the ConversationalRetrieval/RetrievalQA chain was to use it as a tool with an agent (similar to this: https://python.langchain.com/docs/modules/agents/how_to/agent_vectorstore) or to directly use the Conversational Retrieval agent (https://python.langchain.com/docs/use_cases/question_answering/how_to/conversational_retrieval_agents). This solved all my issues with memory/chat history, especially when integrating with streamlit.

mrctito commented 11 months ago

Thanks, I'll check it out.

Em dom., 17 de set. de 2023 às 21:52, Pranavi Shekhar < @.***> escreveu:

Not sure if this is still relevant, but I found that the best way to obtain chat history with the ConversationalRetrieval/RetrievalQA chain was to use it as a tool with an agent (similar to this: https://python.langchain.com/docs/modules/agents/how_to/agent_vectorstore) or to directly use the Conversational Retrieval agent ( https://python.langchain.com/docs/use_cases/question_answering/how_to/conversational_retrieval_agents). This solved all my issues with memory/chat history, especially when integrating with streamlit.

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/2303#issuecomment-1722626831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGFVVT6YABRIFT6RUQBXN3X26LMVANCNFSM6AAAAAAWQLQUF4 . You are receiving this because you commented.Message ID: @.***>

tokunbro commented 11 months ago

The main reason for these type issues is the inconsistency of output keys in Langchain chains.

LLMChain -> 'text' RetrievalQA -> {'question', 'result', 'source_documents'} ConversationalRetrievalChain -> {'question', 'answer', 'source_documents'}

If you are using memory with each chain type

  • if the chain output has only one key memory will get the output by default.
  • if there is more than 1 output keys: use the relevant output key for the chain for example in ConversationalRetrievalChain
 memory = ConversationBufferMemory(
           memory_key='chat_history', return_messages=True, output_key='answer'
 )

If you are using Langchain agents the output key is 'output'

you will need to modify one of

OutputParser

__get_inputoutput function in class BaseChatMemory

for example, ConversationalRetrievalChain with ZeroShotAgent

    def _get_input_output(
        self, inputs: Dict[str, Any], outputs: Dict[str, str]
    ) -> Tuple[str, str]:

        if self.input_key is None:
            prompt_input_key = get_prompt_input_key(inputs, self.memory_variables)
        else:
            prompt_input_key = self.input_key

        if self.output_key is None:
            """
            output for agent with LLM chain tool                     = {answer} 
            output for agent with ConversationalRetrievalChain tool  = {'question', 'chat_history', 'answer','source_documents'}
            """

            LLM_key = 'output'
            Retrieval_key = 'answer'
            if isinstance(outputs[LLM_key], dict):
                Retrieval_dict = outputs[LLM_key]
                if Retrieval_key in Retrieval_dict.keys():
                    #output keys are 'answer' , 'source_documents'
                    output = Retrieval_dict[Retrieval_key]
                else:
                    raise ValueError(f"output key: {LLM_key} not a valid dictionary")

            else:
                #otherwise output key will be 'output'
                output_key = list(outputs.keys())[0]
                output = outputs[output_key]

            # if len(outputs) != 1:
            #     raise ValueError(f"One output key expected, got {outputs.keys()}")

        else:
            output_key = self.output_key
            output = outputs[output_key]

        return inputs[prompt_input_key], output

Not sure why, but I had to set the output_key = 'answer' on the ConversationalRetrieval object as well. It wouldn't work for me just on the memory object.

jimstechwork commented 11 months ago

How to maintain User specific chat_history/memory ? When I built a streamlit app from Cloud Run service, with multiple Users asking questions, it seems to be using the combined chat_history of all Users and responding. How this needs to be handled ?

NageshMashette commented 10 months ago

@nickmuchi87 please see @ToddKerpelman 's answer, add the output_key='answer' in the ConversationBufferMemory. This worked for me.

 memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

yea it works with output_key='answer'

abusufyanvu commented 9 months ago

I hope this below chain can solve your issue,

create a chatbot chain. Memory is managed externally.

qa = ConversationalRetrievalChain.from_llm(
    llm=ChatOpenAI(model_name=llm_name, temperature=0), 
    chain_type=chain_type, 
    retriever=retriever, 
    return_source_documents=True,
    return_generated_question=True,
)
deesumblip commented 9 months ago

Hi there,

In my chat application, I've been able to return either the "source_documents" using the ConversationBufferMemory, or the "answer", but not both.

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key = 'source_documents') # works with either "answer" or "source_documents", but not both

    # set up generic retriever
    retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k":8})

    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        return_source_documents=True
    )

Any suggestions would be great. Is it possible to initialise two separate ConversationBufferMemories and then somehow manually append "source_documents" to "answer" if possible? Seems hack-y, but interested in trying...

BidishaAdhikari commented 9 months ago

So I think the issue here is that the BaseChatMemory gets all confused when the output it receives contains more then one key, and it doesn't know which one to assign as the answer; it's in this code here:

if self.output_key is None:
            if len(outputs) != 1:
                raise ValueError(f"One output key expected, got {outputs.keys()}")
            output_key = list(outputs.keys())[0]
        else:

When you have return_source_documents=True, the output has two keys: answer and source_documents, and that causes this to throw an error.

The workaround that got this working for me was to specify answer as the output key when creating this ConversationBufferMemory object. Then it doesn't have to try to guess at what the output_key is.

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

Thank you so much, it solved my problem.

dileepkg commented 8 months ago

Hello all, I am planning to publish my code as REST API so that consumer can pass question and chat_history as input parameters, I went through whole thread and tried multiple suggestion's but nothing worked to set initial chat history.

Please let me know if someone tried publishing ConversationalRetrievalChain as REST API. Here is my code snippet.

condense_question_prompt = PromptTemplate.from_template(""" Use the following pieces of context and chat history to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Chat history: {chat_history} Question: {question} inputVariables: [ "question", "chat_history"] """)

conversation_chain= ConversationalRetrievalChain.from_llm( llm=ChatOpenAI(model_name='gpt-3.5-turbo-1106', temperature=0.5), retriever=vector_store.as_retriever(), get_chat_history=lambda h :h, memory=ConversationBufferMemory(memory_key='chat_history', return_messages=True, output_key='answer'), chain_type="stuff", condense_question_prompt=condense_question_prompt, )

chat_history = [] chat_history.append(('What is the date Australia was founded','Australia was founded in 1901'))

question = "What was my last question" answer = conversation_chain({"question":question,"chat_history":chat_history}, return_only_outputs=True) print("answer::", answer)

answer:: {'answer': "I don't know, could you please specify your question?"}

deepak-habilelabs commented 8 months ago

output_key='answer'

@esgdao Can u pls share the source code here, actually its getting very messy

Kesavan-Balusamy-C commented 7 months ago

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True,output_key='answer') question_generator = LLMChain(llm=llm, prompt=prompt_1) doc_chain = load_qa_with_sources_chain(llm,chain_type="map_reduce") chain = ConversationalRetrievalChain(retriever=me, question_generator=question_generator, combine_docs_chain=doc_chain, return_source_documents=True, memory=memory, verbose=False, rephrase_question=True, return_generated_question=True )

I'm utilizing the map_reduce chain type for its ability to provide pertinent answers along with the source document name. However, I've noticed that the time taken for answer generation is quite long, typically ranging from 20 to 30 seconds. Could anyone offer suggestions on how to enhance the speed of this process

deepak-hl commented 6 months ago
def retreival_qa_chain(query,COLLECTION_NAME,prompt):
    embedding = OpenAIEmbeddings()
    llm = ChatOpenAI(temperature=0.1,model_name="gpt-3.5-turbo 16k",openai_api_key=env('OPENAI_API_KEY'))
    memory = ConversationBufferMemory(llm=llm,output_key='answer',memory_key='chat_history',return_messages=True)
    print(memory.load_memory_variables({}),"--++++++++++-------------------------------------------")
    vector_store = PGVector(
            connection_string=CONNECTION_STRING,
            collection_name=COLLECTION_NAME,
            embedding_function=embedding
        )
    retriever = vector_store.as_retriever(search_kwargs={"k": 3})
    chain = ConversationalRetrievalChain.from_llm(llm=llm,memory=memory,chain_type="stuff",
      combine_docs_chain_kwargs={'prompt': prompt},
    retriever=retriever,
    return_source_documents=True,
    get_chat_history=lambda h : h,
    verbose=True)
    return chain

I've been struggling with an issue for the past day, and I would be incredibly grateful if you could assist me. When I run memory.load_memory_variables({}), I'm getting an empty array and its not able to store previous conversation and got output : {'chat_history': []} . Could you please help me out? Your assistance would be greatly appreciated. Thank you! @My3VM, @BidishaAdhikari, @pranavi-shekhar @hemanthkrishna1298

sivakumar41 commented 6 months ago

langchain.chains.conversational_retrieval is where ConversationalRetrievalChain lives in the Langchain source code. In that same location is a module called prompts.py which contains both CONDENSE_QUESTION_PROMPT and QA_PROMPT. But there's no mention of qa_prompt in ConversationalRetrievalChain, or its base chain BaseConversationalRetrievalChain, or even its base chain, Base.

That's why I was getting the Pydantic error, qa_prompt extra fields not permitted (type=value_error.extra). qa_prompt is not part of ConversationalRetrievalChain.

A workaround is to insert your custom PromptTemplate into the chain after it's been defined. You have to go very deep into the chain, though. For this example I've defined my prompt as prompt and my chain as chain.

First, import SystemMessagePromptTemplate. Set up your chain as usual, then execute the line below the import:

from langchain.prompts.chat import SystemMessagePromptTemplate
chain.combine_docs_chain.llm_chain.prompt.messages[0] = SystemMessagePromptTemplate(prompt=prompt)

This workaround works for me. Hopefully this will be made easier by the Langchain team in future. If not, I'll just leave it my code. But in the meantime I now have a fully working ConversationalRetrievalChain with ConversationSummaryBufferMemory and a custom prompt.

can you provide the full reference..

deepak-hl commented 6 months ago

@sivakumar41 Hi, I am working on a Chatbot where a user can create multiple projects where each project contain same or different file, the problem is in switching from one project qa object to other project qa object. What happened is each time I am asking a question to it, a new chat object is created from ConversationalRetrievalChain which will overwrite the previous memory and start's fresh. Can you please explain, how can I overcame from these solutions ?? Really stucked into it

fckfck97 commented 6 months ago

I'm using django-rest in a project with langchain, it happens that I use my chroma db and I have a chat with my great data, but the problem I have is that it doesn't keep my chat_history it gets deleted and I've searched a lot and I haven't been able to solve it, who would be able to tell me what I'm doing wrong?

    def handle_chat(self, request):
    chat_history = []
    if os.path.exists(self.persist_directory):
        custom_template = """Given the following conversation and a follow-up message,\
            Rephrase the follow-up message into a separate question or instruction that \
            represents the user's intent, add all the necessary context if necessary to generate a complete file and\
            Unambiguous questions or instructions, only based on the story, do not invent messages, take all the questions and process them as one, if applicable, give a single answer if this warrants it.\
            Keep the same language as the follow-up input message.
        Chat History:
        {chat_history}
        Follow Up Input: {question}
        Standalone question or instruction:"""

        memory = ConversationBufferMemory(
    memory_key='chat_history', return_messages=True, output_key='answer')

        vectorstore = Chroma(
            persist_directory=self.persist_directory, embedding_function=OpenAIEmbeddings())

        chain = ConversationalRetrievalChain.from_llm(
            llm=ChatOpenAI(model=self.model_name),
            retriever=vectorstore.as_retriever(
            search_kwargs={"k": 1}),
            memory=memory,
            condense_question_prompt=PromptTemplate.from_template(
            custom_template),

        )
    else:
        print("No existing index found. Please create one before using this view.")
        vectorstore = None
        chain = None

    if not chain:
        return Response({"error": "Initialization error or index not found."},
                        status=status.HTTP_500_INTERNAL_SERVER_ERROR)

    messages = request.data.get('messages', [])

    if len(messages) > 1:
        aggregated_body = '\n'.join(f"{index + 1}- {msg.get('body')}" for index, msg in enumerate(messages) if msg.get('body'))
        response_body = aggregated_body
    elif len(messages) == 1 and messages[0].get("body"):
        response_body = messages[0].get("body")
    else:
        return Response({"error": "No valid messages found."}, status=status.HTTP_400_BAD_REQUEST)

    if messages:
        response = chain.invoke({"question": response_body, "chat_history": chat_history})
        answer = response.get("answer", "No se encontró respuesta.")
        chat_history.append((response_body, answer))
        print(chat_history)

    else:
        answer = "Lo siento, no tengo una respuesta concreta para tu pregunta."

    return Response({"responses": [{"response": answer}]}, status=status.HTTP_200_OK)
sivakumar41 commented 6 months ago

I'm using django-rest in a project with langchain, it happens that I use my chroma db and I have a chat with my great data, but the problem I have is that it doesn't keep my chat_history it gets deleted and I've searched a lot and I haven't been able to solve it, who would be able to tell me what I'm doing wrong?

    def handle_chat(self, request):
    chat_history = []
    if os.path.exists(self.persist_directory):
        custom_template = """Given the following conversation and a follow-up message,\
            Rephrase the follow-up message into a separate question or instruction that \
            represents the user's intent, add all the necessary context if necessary to generate a complete file and\
            Unambiguous questions or instructions, only based on the story, do not invent messages, take all the questions and process them as one, if applicable, give a single answer if this warrants it.\
            Keep the same language as the follow-up input message.
        Chat History:
        {chat_history}
        Follow Up Input: {question}
        Standalone question or instruction:"""

        memory = ConversationBufferMemory(
    memory_key='chat_history', return_messages=True, output_key='answer')

        vectorstore = Chroma(
            persist_directory=self.persist_directory, embedding_function=OpenAIEmbeddings())

        chain = ConversationalRetrievalChain.from_llm(
            llm=ChatOpenAI(model=self.model_name),
            retriever=vectorstore.as_retriever(
            search_kwargs={"k": 1}),
            memory=memory,
            condense_question_prompt=PromptTemplate.from_template(
            custom_template),

        )
    else:
        print("No existing index found. Please create one before using this view.")
        vectorstore = None
        chain = None

    if not chain:
        return Response({"error": "Initialization error or index not found."},
                        status=status.HTTP_500_INTERNAL_SERVER_ERROR)

    messages = request.data.get('messages', [])

    if len(messages) > 1:
        aggregated_body = '\n'.join(f"{index + 1}- {msg.get('body')}" for index, msg in enumerate(messages) if msg.get('body'))
        response_body = aggregated_body
    elif len(messages) == 1 and messages[0].get("body"):
        response_body = messages[0].get("body")
    else:
        return Response({"error": "No valid messages found."}, status=status.HTTP_400_BAD_REQUEST)

    if messages:
        response = chain.invoke({"question": response_body, "chat_history": chat_history})
        answer = response.get("answer", "No se encontró respuesta.")
        chat_history.append((response_body, answer))
        print(chat_history)

    else:
        answer = "Lo siento, no tengo una respuesta concreta para tu pregunta."

    return Response({"responses": [{"response": answer}]}, status=status.HTTP_200_OK)

@fckfck97

Your creating the memory object for every function call , Thats Why every time new object is creating and cant able to remember your chat history due to creation of memory object every time, memory = ConversationBufferMemory(

memory_key='chat_history', return_messages=True, output_key='answer')

Keep this code outside of the function

ya-smin20 commented 4 months ago

why chat_history is not working in my case , here is my code , can someone give me a hand : Ps: I'm storing the chat history in a MongoDB database and then retrieving it from there, but it doesn't seem to be integrated with the chain

def make_chain(session_id ,chat_history): QA_PROMPT_DOCUMENT_CHAT ="""......"

Chat history: {chat_history} Context:{context}

Question:{question} input_variables : ["context","question","chat_history"] Helpful Answer:"""

Create a prompteTemplate with our custom template

custom_prompt = PromptTemplate( input_variables=["context","question","chat_history"], template=QA_PROMPT_DOCUMENT_CHAT ) model = ChatOpenAI( model_name="gpt-3.5-turbo", temperature="0.7",

verbose=True

)

global memory memory = ConversationBufferMemory(

llm=model,

chat_memory=ChatMessageHistory(messages=[]), memory_key="chat_history", return_messages=True , input_key='question', output_key='answer', chat_history_key= 'chat_history'

chat_memory=chat_history

)

embeddings = OpenAIEmbeddings()

vector_store = Chroma( collection_name="data", embedding_function=embeddings, persist_directory="chroma", )

chain= ConversationalRetrievalChain.from_llm( llm=model, retriever=vector_store.as_retriever(), return_source_documents=False, combine_docs_chain_kwargs={"prompt": custom_prompt}, condense_question_prompt=custom_prompt, memory=memory,

chain_type="retrieval_qa",

get_chat_history=lambda h : h, output_key='answer'

return_generated_question=True

verbose=True,

) system_message_prompt = SystemMessagePromptTemplate.from_template(QA_PROMPT_DOCUMENT_CHAT)

chain.combine_docs_chain.llm_chain.prompt.messages[0] = system_message_prompt

chain.combine_docs_chain.llm_chain.prompt.input_variables = ['context', 'question', 'chat_history']

return chain

def gen_answer(user_input,session_id,chat_history_input=None):

chat_historycollection = db[f"session{session_id}"] chat_history_docs = chat_history_collection.find()

chat_history = []

for doc in chat_history_docs: role = doc["role"] content = doc["content"]

if role == "human":
    chat_history.append(HumanMessage(content=content))
elif role == "ai":
    chat_history.append(AIMessage(content=content))

chain = make_chain(session_id=session_id,chat_history=chat_history)

question = user_input

Generate answer

response = chain({"question": question, "chat_history": chat_history })

chat_history.append(HumanMessage(content=question))

Print out the fetched chat history for debugging

print("Fetched chat history:", chat_history)

Print out the response for debugging

print("Response:", response)

answer = response["answer"]

return answer

amitoshacharya commented 1 month ago

Hi,

I'm following the Chat index examples and was surprised that the history is not a Memory object but just an array. However, it is possible to pass a memory object to the constructor, if

  1. I also set memory_key to 'chat_history' (default key names are different between ConversationBufferMemory and ConversationalRetrievalChain)
  2. I also adjust get_chat_history to pass through the history from the memory, i.e. lambda h : h.

This is what that looks like:

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=False)
conv_qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=retriever, 
    memory=memory,
    get_chat_history=lambda h : h)

Now, my issue is that if I also want to return sources that doesn't work with the memory - i.e. this does not work:

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=False)
conv_qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=retriever, 
    memory=memory,
    get_chat_history=lambda h : h,
    return_source_documents=True)

The error message is "ValueError: One output key expected, got dict_keys(['answer', 'source_documents'])".

Maybe I'm doing something wrong? If not, this seems worth fixing to me - or, more generally, make memory and the ConversationalRetrievalChain more directily compatible?

This solution works for me to add chat history

chat_history = ['user_query', 'ai_response'] if chat_history: memory = ConversationBufferWindowMemory(memory_key="chat_history", output_key="answer", return_messages=True) memory.chat_memory.add_user_message(chat_history[0]) memory.chat_memory.add_ai_message(chat_history[1]) else: memory = ConversationBufferWindowMemory(k=0, memory_key="chat_history", output_key="answer", return_messages=True) print(memory) chain = ConversationalRetrievalChain.from_llm(llm=model, retriever=retriever, memory= memory, get_chat_history=lambda h : h, return_source_documents = True, combine_docs_chain_kwargs = {'prompt': prompt_template}, verbose= verbose ) response = chain.invoke({"question":user_question, "chat_history": chat_history})