langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
89.62k stars 14.16k forks source link

ConversationalRetrievalChain + Memory #2303

Open da-bu opened 1 year ago

da-bu commented 1 year ago

Hi,

I'm following the Chat index examples and was surprised that the history is not a Memory object but just an array. However, it is possible to pass a memory object to the constructor, if

  1. I also set memory_key to 'chat_history' (default key names are different between ConversationBufferMemory and ConversationalRetrievalChain)
  2. I also adjust get_chat_history to pass through the history from the memory, i.e. lambda h : h.

This is what that looks like:

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=False)
conv_qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=retriever, 
    memory=memory,
    get_chat_history=lambda h : h)

Now, my issue is that if I also want to return sources that doesn't work with the memory - i.e. this does not work:

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=False)
conv_qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=retriever, 
    memory=memory,
    get_chat_history=lambda h : h,
    return_source_documents=True)

The error message is "ValueError: One output key expected, got dict_keys(['answer', 'source_documents'])".

Maybe I'm doing something wrong? If not, this seems worth fixing to me - or, more generally, make memory and the ConversationalRetrievalChain more directily compatible?

xyfusion commented 1 year ago

I am having similar issue. Memory with ChatOpenAI works fine for the Conversation chain, but not fully compatible with ConversationalRetrievalChain. Look forward to hearing a working solution on this given retrieval is a common use case in conversation chains.

malcolmosh commented 1 year ago

A chat_history object consisting of (user, human) string tuples passed to the ConversationalRetrievalChain.from_llm method will automatically be formatted through the _get_chat_history function. In a chatbot, you can simply keep appending inputs and outputs to the chat_history list and use it instead of ConversationBufferMemory. This chat_history list will be nicely formatted in the prompt sent to an LLM. Though I agree that the overall concept of memory should be applicable everywhere and things should be harmonized...

jordanparker6 commented 1 year ago

A chat_history object consisting of (user, human) string tuples passed to the ConversationalRetrievalChain.from_llm method will automatically be formatted through the _get_chat_history function. In a chatbot, you can simply keep appending inputs and outputs to the chat_history list and use it instead of ConversationBufferMemory. This chat_history list will be nicely formatted in the prompt sent to an LLM. Though I agree that the overall concept of memory should be applicable everywhere and things should be harmonized...

I have been playing around with this today. The _get_chat_history function stuffs a provided list of history tuples into the a preliminary query that reforms the input as a new question with the following prompt.

CONDENSE_QUESTION_TEMPLATE = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""

It then follows the regular retrieval prompt, providing both the context retrieved from the retriver and the summarised question (given the history). This approach to conversational memory broke down pretty quickly when asked questions about past inputs. Taking the history and summarising it in a new question seem to create a mismatch between the new question and the context. For example:

input

history = [
    ("What is the date Australia was founded.", "Australia was founded in 1901."),
]

chain({ "question": "What was the last question I asked you.", "chat_history": history }, return_only_outputs=True)

logs

DEBUG:openai:api_version=None data='{"messages": [{"role": "system", "content": "Use the following pieces of context to answer the users question. \\nIf you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\\n {<CONTEXT FROM DOCUMENTS RETRIEVED>}"}, {"role": "user", "content": "Can you remind me of the last question I asked you?"}], "model": "gpt-3.5-turbo", "max_tokens": null, "stream": false, "n": 1, "temperature": 0}' message='Post details'

This returned 'Your last question was "What are the pieces of context that can be used to answer the user\'s question?"'

Which is a summary of the QA_Prompt template itself...

Would following the ChatOpenAI API of a list of the raw messages with the history injected avoid this? It is kind of like the windowed conversational memory buffer? The summarisation into a new question may be doing a disservice for answering basic conversational memory as the conversation isn't provided as context...

ogmios2 commented 1 year ago

Definitely issues. Just spent 2 days racking my brain on trying to make a "chain" of Pinecone retrieval, prompt template, chat history... which you'd think that it would be easy and whole purpose of LangChain to have various blocks or pieces of chains work well together.

It seems that ConversationRetrievalChain does not work with context, history and prompt template at all. It semi works with some, but not as whole.

ToddKerpelman commented 1 year ago

So I think the issue here is that the BaseChatMemory gets all confused when the output it receives contains more then one key, and it doesn't know which one to assign as the answer; it's in this code here:

if self.output_key is None:
            if len(outputs) != 1:
                raise ValueError(f"One output key expected, got {outputs.keys()}")
            output_key = list(outputs.keys())[0]
        else:

When you have return_source_documents=True, the output has two keys: answer and source_documents, and that causes this to throw an error.

The workaround that got this working for me was to specify answer as the output key when creating this ConversationBufferMemory object. Then it doesn't have to try to guess at what the output_key is.

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')
My3VM commented 1 year ago

I too had similar issue, thanks for this response!

amitmukh commented 1 year ago

I have the similar issue. qa=ConversationalRetrievalChain.from_llm( llm=llm, chain_type = "stuff", retriever=index.as_retriever(), return_source_documents=True, memory =st.session_state.memory ) In fact, if I remove the "return_source_documents=True," line then I am getting another issue: pydantic.error_wrappers.ValidationError: 1 validation error for AIMessage content str type expected (type=type_error.str)

nickmuchi87 commented 1 year ago

I have the same issue as well, has anyone manage to use that chain with memory, a custom prompt and return_source_documents? I was getting errors. I added my prompt template under qa_prompt and got errors.

jeloooooo commented 1 year ago

@nickmuchi87 please see @ToddKerpelman 's answer, add the output_key='answer' in the ConversationBufferMemory. This worked for me.

 memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')
nickmuchi87 commented 1 year ago

Ye that worked when I did not have a custom prompt but when I tried to include a prompt I got a context error.

tevslin commented 1 year ago

I do not have a custom prompt and the fix above didn't work for me.

amitmukh commented 1 year ago

same here. I also don't have the custom prompt and fix didn't work for me as well.

My3VM commented 1 year ago

I guess one could just use default QA_PROMPT in case one has no requirements for prompt customisation.

` from langchain.chains.conversational_retrieval.prompts import QA_PROMPT

memory = ConversationSummaryMemory( llm = OpenAI(model_name='gpt-3.5-turbo'), memory_key='chat_history', return_messages=True, output_key='answer')

model = ConversationalRetrievalChain.from_llm( llm=OpenAI(model_name='gpt-3.5-turbo'), retriever=retriever, memory=memory, return_source_documents=True, get_chat_history=lambda h :h, qa_prompt=QA_PROMPT #use custom prompt if one needs customisation. ) `

tevslin commented 1 year ago

Thank you. This worked for me although I got an error putting an llm parameter in memory = and had to take it out.

Other differences in my test (altho I'm not sure they matter): I am using gpt-4 I did not include the qa_pormpt parameter I did not specify return_source_documents

My guess is that the lambsa function for get_chat_history is making the difference since the error occurs in _get_cchat_history I did verify that memory is actually working by having my 2d query refer back to the answer of the first query using only a pronoun.

muhammadsr commented 1 year ago

I am still struggling. It does not remember anything in the chat history. What am I doing wrong? Here is my code:

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

    chain = ConversationalRetrievalChain.from_llm(
        llm=ChatOpenAI(model_name=model_name, temperature=0.7),
        memory=memory,
        qa_prompt=QA_PROMPT,
        retriever=vectors.as_retriever(),
    )

    result1 = chain({"question": "What was the last question I asked you?", "chat_history": [
        ("What is the date Australia was founded.", "Australia was founded in 1901."),
    ]}, return_only_outputs=True)

    print('Response: ', result1["answer"])

Response: I'm sorry, you did not ask me a question. Is there anything I can help you with?

My3VM commented 1 year ago

Response: I'm sorry, you did not ask me a question. Is there anything I can help you with?

Have you tried providing this get_chat_history ?

esgdao commented 1 year ago

I'm also trying to get ConversationalRetrievalChain and ConversationSummaryBufferMemory to work together. I've tried all of the above. I'm at the point where I get no errors and I can see the chat history in the response. But there is only ever one pair of entries in the history. I call the ConversationalRetrievalChain with response = chain({"question": query}). I've seen examples where "chat_history": [] is also passed in to the chain but that doesn't make sense to me. Is there anything in here that I should be doing differently? Everything else works perfectly.

ogmios2 commented 1 year ago

for anyone having issues. I was able to get something working using a different approach. Take a look at my code: https://github.com/ogmios2/lsvtChatBot and see if that fits the need for anyone of you.

esgdao commented 1 year ago

Thank you, @ogmios2. I'll take a look.

I just did a test without Streamlit and got multiple entries in the chat history, which is what I was looking for. So it seems that Streamlit's behaviour of running the whole Python script from the top is resetting everything (duh!). It looks like I have a working combination of Pinecone and ConversationalRetrievalChain, with ConversationSummaryBufferMemory providing chat history. I am not yet using a custom prompt, however.

Now I just have to figure out a way around this Streamlit issue. If anyone is interested I could post the minimum viable code to show this working. You would need your own Pinecone account and index, of course.

esgdao commented 1 year ago

I know it's a bit off-topic, but for those using Streamlit with Langchain, you probably need to cache things like ConversationSummaryBufferMemory so that they're not obliterated every time someone submits a form or otherwise re-runs your Python script.

@st.cache_resource
def init_memory():
    return ConversationSummaryBufferMemory(
        llm=llm,
        output_key='answer',
        memory_key='chat_history',
        return_messages=True)
memory = init_memory()
nickmuchi87 commented 1 year ago

Please share your min viable code?

On Thu, 4 May 2023 at 18:25 OpenESG @.***> wrote:

Thank you, @ogmios2 https://github.com/ogmios2. I'll take a look.

I just did a test without Streamlit and got multiple entries in the chat history, which is what I was looking for. So it seems that Streamlit's behaviour of running the whole Python script from the top is resetting everything (duh!). It looks like I have a working combination of Pinecone and ConversationalRetrievalChain, with ConversationSummaryBufferMemory providing chat history. I am not yet using a custom prompt, however.

Now I just have to figure out a way around this Streamlit issue. If anyone is interested I could post the minimum viable code to show this working. You would need your own Pinecone account and index, of course.

— Reply to this email directly, view it on GitHub https://github.com/hwchase17/langchain/issues/2303#issuecomment-1535485348, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ4RW42QHY6ODH7QJXO7SLLXEQUELANCNFSM6AAAAAAWQLQUF4 . You are receiving this because you were mentioned.Message ID: @.***>

nickmuchi87 commented 1 year ago

when I add a custom prompt I get the below error:

ValidationError: 1 validation error for ConversationalRetrievalChain qa_prompt extra fields not permitted (type=value_error.extra)

code..

`

                                  output_key='answer',
                                  return_messages=True)

def load_prompt():
    system_template="""You are an expert in finance, economics, investing, ethics, derivatives and markets. 
    Use the following pieces of context to answer the users question. If you don't know the answer, 
    just say that you don't know, don't try to make up an answer. Provide a source reference. ALWAYS return a "sources" part in your answer.
    The "sources" part should be a reference to the source of the documents from which you got your answer. 
    Remember to only use the given context to answer the question, very important.

    Question: {question}
Begin!
----------------
{context}"""
messages = [
    SystemMessagePromptTemplate.from_template(system_template),
    HumanMessagePromptTemplate.from_template("{question}")
]
prompt = ChatPromptTemplate.from_messages(messages)

return prompt

Prompt Template & Messages

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question. Chat History: {chat_history} Follow Up Input: {question} Standalone question:""" CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

chain = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0, model_name="gpt-4"), retriever=cfa_db.as_retriever(search_kwargs={"k": 3}), qa_prompt = load_prompt(), condense_question_prompt=CONDENSE_QUESTION_PROMPT, return_source_documents=True, memory=memory )

chain({"question": "What is an MBS"})

esgdao commented 1 year ago

Please share your min viable code?

Here is a working example using a Pinecone vector store, ConversationalRetrievalChain, and ConversationSummaryBufferMemory. It makes three queries, printing the chat history after each. You will need your own Pinecone index for this to work as-is, asking relevant questions of your own store. I am going to see if I can include a custom prompt today, though from previous messages here I'm sure I'll run into issues.

import pinecone
from keys import OPENAI_API_KEY, PINECONE_API_KEY, PINECONE_ENV
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import Pinecone
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationSummaryBufferMemory

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENV)
vectorstore = Pinecone.from_existing_index("index", embeddings)

llm = ChatOpenAI(
    temperature=0.0,
    model_name="gpt-3.5-turbo",
    openai_api_key=OPENAI_API_KEY)

memory = ConversationSummaryBufferMemory(
    llm=llm,
    output_key='answer',
    memory_key='chat_history',
    return_messages=True)

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4, "include_metadata": True})

chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    memory=memory,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    get_chat_history=lambda h : h,
    verbose=False)

response = chain({"question": "How are you today?"})
print(f"{response['chat_history']}\n")

response = chain({"question": "Can you help me understand ESG?"})
print(f"{response['chat_history']}\n")

response = chain({"question": "What is ARC's potential hurt approach?"})
print(f"{response['chat_history']}\n")
tevslin commented 1 year ago

This technique works very well for me. You can also test with a variable name is already in locals() before setting it which is essentially the same thing.

From: OpenESG @.> Sent: Thursday, May 4, 2023 10:15 PM To: hwchase17/langchain @.> Cc: Tom Evslin @.>; Comment @.> Subject: Re: [hwchase17/langchain] ConversationalRetrievalChain + Memory (Issue #2303)

I know it's a bit off-topic, but for those using Streamlit with Langchain, you probably need to cache things like ConversationSummaryBufferMemory so that they're not obliterated every time someone submits a form or otherwise re-runs your Python script.

@st.cache_resource def init_memory(): return ConversationSummaryBufferMemory( llm=llm, output_key='answer', memory_key='chat_history', return_messages=True) memory = init_memory()

— Reply to this email directly, view it on GitHub https://github.com/hwchase17/langchain/issues/2303#issuecomment-1535612914 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AGLSBBRIIKY2FOIPUJLPXU3XERPB3ANCNFSM6AAAAAAWQLQUF4 . You are receiving this because you commented. https://github.com/notifications/beacon/AGLSBBRBMRH42ZWOOXMAAIDXERPB3A5CNFSM6AAAAAAWQLQUF6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTS3Q6L7E.gif Message ID: @. @.> >

esgdao commented 1 year ago

when I add a custom prompt I get the below error:

ValidationError: 1 validation error for ConversationalRetrievalChain qa_prompt extra fields not permitted (type=value_error.extra)

I get the same error as @nickmuchi87 now. I was able to add condense_question_prompt to my chain definition with no issues, but when I add qa_prompt I get the Pydantic error as above. It doesn't seem to matter whether it's a custom definition or the template value from importing Langchain's default QA_PROMPT, as suggested by @My3VM.

esgdao commented 1 year ago

langchain.chains.conversational_retrieval is where ConversationalRetrievalChain lives in the Langchain source code. In that same location is a module called prompts.py which contains both CONDENSE_QUESTION_PROMPT and QA_PROMPT. But there's no mention of qa_prompt in ConversationalRetrievalChain, or its base chain BaseConversationalRetrievalChain, or even its base chain, Base.

That's why I was getting the Pydantic error, qa_prompt extra fields not permitted (type=value_error.extra). qa_prompt is not part of ConversationalRetrievalChain.

A workaround is to insert your custom PromptTemplate into the chain after it's been defined. You have to go very deep into the chain, though. For this example I've defined my prompt as prompt and my chain as chain.

First, import SystemMessagePromptTemplate. Set up your chain as usual, then execute the line below the import:

from langchain.prompts.chat import SystemMessagePromptTemplate
chain.combine_docs_chain.llm_chain.prompt.messages[0] = SystemMessagePromptTemplate(prompt=prompt)

This workaround works for me. Hopefully this will be made easier by the Langchain team in future. If not, I'll just leave it my code. But in the meantime I now have a fully working ConversationalRetrievalChain with ConversationSummaryBufferMemory and a custom prompt.

nickmuchi87 commented 1 year ago

that worked for me as well, thank you!

On Sat, 6 May 2023 at 08:25 OpenESG @.***> wrote:

langchain.chains.conversational_retrieval is where ConversationalRetrievalChain lives in the Langchain source code. In that same location is a module called prompts.py which contains both CONDENSE_QUESTION_PROMPT and QA_PROMPT. But there's no mention of qa_prompt in ConversationalRetrievalChain, or its base chain BaseConversationalRetrievalChain, or even its base chain, Base.

That's why I was getting the Pydantic error, qa_prompt extra fields not permitted (type=value_error.extra). qa_prompt is not part of ConversationalRetrievalChain.

A workaround is to insert your custom PromptTemplate into the chain after it's been defined. You have to go very deep into the chain, though. For this example I've defined my prompt as prompt and my chain as chain.

First, import SystemMessagePromptTemplate. Set up your chain as usual, then execute the line below the import:

from langchain.prompts.chat import SystemMessagePromptTemplatechain.combine_docs_chain.llm_chain.prompt.messages[0] = SystemMessagePromptTemplate(prompt=prompt)

This workaround works for me. Hopefully this is an oversight by the Langchain team and will be fixed in future. If not, I'll just leave this my code. But in the meantime I now have a fully working ConversationalRetrievalChain with ConversationSummaryBufferMemory and a custom prompt.

— Reply to this email directly, view it on GitHub https://github.com/hwchase17/langchain/issues/2303#issuecomment-1537131287, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ4RW45FPRH67F6KXNZ7SS3XEY7LJANCNFSM6AAAAAAWQLQUF4 . You are receiving this because you were mentioned.Message ID: @.***>

ATuxedo commented 1 year ago

There is the same question. I tried the above method, but it doesn't help.

yysturdy commented 1 year ago

我还在掐扎。它不会记住聊天记录中的任何内容。我研究究竟做错了什么?这是我的代号:

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

    chain = ConversationalRetrievalChain.from_llm(
        llm=ChatOpenAI(model_name=model_name, temperature=0.7),
        memory=memory,
        qa_prompt=QA_PROMPT,
        retriever=vectors.as_retriever(),
    )

    result1 = chain({"question": "What was the last question I asked you?", "chat_history": [
        ("What is the date Australia was founded.", "Australia was founded in 1901."),
    ]}, return_only_outputs=True)

    print('Response: ', result1["answer"])

回复:I'm sorry, you did not ask me a question. Is there anything I can help you with?

我也遇到了这个问题,请问一下这个问题你解决了吗?

My3VM commented 1 year ago

What i figured is that the key qa_prompt to the ConversationalRetrievalChain was working with older versions of LangChain 0.0.155 or earlier. However, the same has been removed in the later versions.

Yet another workaround that I'm employing little different components to achieve the same is with:

`question_generator = LLMChain(llm=OpenAI(model_name='gpt-3.5-turbo',temperature=0), prompt=CONDENSE_QUESTION_PROMPT)

doc_chain = load_qa_chain(OpenAI(model_name='gpt-3.5-turbo',temperature=0,max_tokens=250),chain_type="stuff",prompt=CUSTOM_QA_PROMPT)

model = ConversationalRetrievalChain( retriever=retriever, question_generator=question_generator, combine_docs_chain=doc_chain, memory=memory, return_source_documents=True, get_chat_history=lambda h :h, )`

You could provide custom prompts to both question_generator and qa_chain as applicable. So far, this seem to be working and I hope it continues to do so.

esgdao commented 1 year ago

That's pretty much exactly what I ended up doing. And it works without a hitch.

Anup-Deshmukh commented 1 year ago

@esgdao could you please share your Streamlit code with ConversationalRetrievalChain + ConversationSummaryBufferMemory + custom prompt

MarkEdmondson1234 commented 1 year ago

I solved it like this on langchain==0.0.176

This has happened to me a few times and I always end up here, the easiest work around is to not use run:

...
result = qa({"question": "do you know anything about coor?", 
             "chat_history": [
        ("What is the date Australia was founded.", "Australia was founded in 1901.")]})

e.g don't do:


result = qa.run({"question": "do you know anything about coor?", 
             "chat_history": [
        ("What is the date Australia was founded.", "Australia was founded in 1901.")]})
# raise ValueError(
ValueError: `run` not supported when there is not exactly one output key. Got ['answer', 'source_documents'].
yysturdy commented 1 year ago
chain.combine_docs_chain.llm_chain.prompt.messages[0] = SystemMessagePromptTemplate(prompt=prompt)

您好,可以分享您的代码吗?

talhaanwarch commented 1 year ago

I am still struggling. It does not remember anything in the chat history. What am I doing wrong? Here is my code:

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

    chain = ConversationalRetrievalChain.from_llm(
        llm=ChatOpenAI(model_name=model_name, temperature=0.7),
        memory=memory,
        qa_prompt=QA_PROMPT,
        retriever=vectors.as_retriever(),
    )

    result1 = chain({"question": "What was the last question I asked you?", "chat_history": [
        ("What is the date Australia was founded.", "Australia was founded in 1901."),
    ]}, return_only_outputs=True)

    print('Response: ', result1["answer"])

Response: I'm sorry, you did not ask me a question. Is there anything I can help you with?

@muhammadsr did you find a solution?

sum-coderepo commented 1 year ago

So I think the issue here is that the BaseChatMemory gets all confused when the output it receives contains more then one key, and it doesn't know which one to assign as the answer; it's in this code here:

if self.output_key is None:
            if len(outputs) != 1:
                raise ValueError(f"One output key expected, got {outputs.keys()}")
            output_key = list(outputs.keys())[0]
        else:

When you have return_source_documents=True, the output has two keys: answer and source_documents, and that causes this to throw an error.

The workaround that got this working for me was to specify answer as the output key when creating this ConversationBufferMemory object. Then it doesn't have to try to guess at what the output_key is.

    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

I am stilll facing the issue even after adding the output_key = 'answer'

 def conversationChain(self):
        llm = ChatOpenAI(deployment_name="gpt-35-turbo", temperature=0.9)
        retriever = self.vector_store.as_retriever()

        memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True, output_key='answer')

        chain = ConversationalRetrievalChain.from_llm(llm, retriever, return_source_documents = True, memory=memory)

        return chain
samthedataman commented 1 year ago

A chat_history object consisting of (user, human) string tuples passed to the ConversationalRetrievalChain.from_llm method will automatically be formatted through the _get_chat_history function. In a chatbot, you can simply keep appending inputs and outputs to the chat_history list and use it instead of ConversationBufferMemory. This chat_history list will be nicely formatted in the prompt sent to an LLM. Though I agree that the overall concept of memory should be applicable everywhere and things should be harmonized...

I have been playing around with this today. The _get_chat_history function stuffs a provided list of history tuples into the a preliminary query that reforms the input as a new question with the following prompt.

CONDENSE_QUESTION_TEMPLATE = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""

It then follows the regular retrieval prompt, providing both the context retrieved from the retriver and the summarised question (given the history). This approach to conversational memory broke down pretty quickly when asked questions about past inputs. Taking the history and summarising it in a new question seem to create a mismatch between the new question and the context. For example:

input

history = [
    ("What is the date Australia was founded.", "Australia was founded in 1901."),
]

chain({ "question": "What was the last question I asked you.", "chat_history": history }, return_only_outputs=True)

logs

DEBUG:openai:api_version=None data='{"messages": [{"role": "system", "content": "Use the following pieces of context to answer the users question. \\nIf you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\\n {<CONTEXT FROM DOCUMENTS RETRIEVED>}"}, {"role": "user", "content": "Can you remind me of the last question I asked you?"}], "model": "gpt-3.5-turbo", "max_tokens": null, "stream": false, "n": 1, "temperature": 0}' message='Post details'

This returned 'Your last question was "What are the pieces of context that can be used to answer the user\'s question?"'

Which is a summary of the QA_Prompt template itself...

Would following the ChatOpenAI API of a list of the raw messages with the history injected avoid this? It is kind of like the windowed conversational memory buffer? The summarisation into a new question may be doing a disservice for answering basic conversational memory as the conversation isn't provided as context...

can you post your code or a link to your code?

dcellison commented 1 year ago

ConversationalRetrievalChain returns the chat history in its response. If it is not using the history for answering the question, then why is it even in the response?

talhaanwarch commented 1 year ago

ConversationalRetrievalChain returns the chat history in its response. If it is not using the history for answering the question, then why is it even in the response?

Thats weired, if you set the verbose=True, you can see chat history there, which mean it has acess to it. But the prompt format the question to an answer. Thats why it cant use chat history. in #5984 you can see it has acess to history, but the formation of follow up question is not correct. I tried to modify both prompts, but its still not working

talhaanwarch commented 1 year ago

i think #5572 is an attemp to fix this issue, so make sure you have updated version

samthedataman commented 1 year ago

i think #5572 is an attemp to fix this issue, so make sure you have updated version

Can someone post just bare bones python code that exemplifies calling the openai api and using this method and print the chat history + answers.

I'm confused why this is so confusing. This is my current code

condense_prompt = PromptTemplate.from_template() combine_docs_custom_prompt_og = PromptTemplate.from_template()

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

chain = ConversationalRetrievalChain.from_llm( ChatOpenAI(temperature=0.3), vectordb.as_retriever(), # see below for vectorstore definition memory=memory, condense_question_prompt=condense_prompt, combine_docs_chain_kwargs=dict(prompt=combine_docs_custom_prompt),

how do you return the chat history?

talhaanwarch commented 1 year ago

This is how I do

chain = ConversationalRetrievalChain.from_llm(llm,
                                        retriever=retriever.as_retriever(),
                                        memory=memory,
                                        chain_type="stuff",
                                        return_source_documents=True,
                                        verbose=True,
                                        condense_question_prompt = prompt_chat,
                                        return_generated_question=True,
                                        get_chat_history=get_chat_history,
                                        combine_docs_chain_kwargs={"prompt": prompt_doc})

here is you can see chat history.

def get_chat_history(inputs):
    inputs = [i.content for i in inputs]
    return  '\n'.join(inputs)

you can also do get_chat_history=lambda h:h

samthedataman commented 1 year ago

This is great but can you show how your generating the result and history in a print statement?

This is how I do

chain = ConversationalRetrievalChain.from_llm(llm,
                                        retriever=retriever.as_retriever(),
                                        memory=memory,
                                        chain_type="stuff",
                                        return_source_documents=True,
                                        verbose=True,
                                        condense_question_prompt = prompt_chat,
                                        return_generated_question=True,
                                        get_chat_history=get_chat_history,
                                        combine_docs_chain_kwargs={"prompt": prompt_doc})

here is you can see chat history.

def get_chat_history(inputs):
    inputs = [i.content for i in inputs]
    return  '\n'.join(inputs)

you can also do get_chat_history=lambda h:h

This looks correct but now how do i access the history itself?

talhaanwarch commented 1 year ago

@samthedataman try print(memory)

samthedataman commented 1 year ago

where is the working streamlit example of this with langchain and memory? i keep seeing bits of code everywhere

samthedataman commented 1 year ago

print How do i assign this to the {chat_history} in the prompt template?

condense_prompt = PromptTemplate.from_template("{chat_history}")

talhaanwarch commented 1 year ago

You dont need to. If you are defining it as

ConversationalRetrievalChain.from_llm(llm,memory=memory)

It will pick it self. But here is how you can do it

prompt_template_doc = """ Use the following pieces of context to answer the question at the end. {context} If you still cant find the answer, just say that you don't know, don't try to make up an answer. You can also look into chat history. {chat_history} Question: {question} Answer: """


prompt_doc = PromptTemplate(
    template=prompt_template_doc, input_variables=["context", "question","chat_history"]]
)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True,output_key='answer')

and pass it as

ConversationalRetrievalChain.from_llm(

                                        ChatOpenAI(model_name='gpt-3.5-turbo',temperature=0),
                                        retriever=docsearch, 
                                        memory=memory,
                        combine_docs_chain_kwargs={"prompt": prompt_doc},

)

If still not solved, i would suggest open a new discussion with your code and mention me there

samthedataman commented 1 year ago

this exact code is not saving memory and this is exactly how you've shown to implement the chat bot. It is still NOT saving memory as it should. Do you have any idea @talhaanwarch where im going wrong still here?


def retreive_best_answer(full_user_question: str):

    openai.api_key = os.getenv("OPEN_API_KEY")
    embeddings = OpenAIEmbeddings()
    llm = OpenAI(temperature=0.1)
    vectordb = FAISS.load_local("merged_faiss_index", embeddings)

    prompt_template_doc = """
        Use the following pieces of context to answer the question at the end.
        {context}
        If you still cant find the answer, just say that you don't know, don't try to make up an answer.
        You can also look into chat history.
        {chat_history}
        Question: {question}
        Answer:
        """

    prompt_doc = PromptTemplate(
        template=prompt_template_doc, input_variables=["context", "question", "chat_history"])

    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True,output_key='answer')

    # df_loader = DataFrameLoader(df, page_content_column="full_text")
    # df_docs = df_loader.load()

    # faiss_db = FAISS.from_documents(df_docs, embeddings)

    qa = ConversationalRetrievalChain.from_llm(
        OpenAI(temperature=0.1),
        vectordb.as_retriever(),
        memory=memory,
       combine_docs_chain_kwargs={"prompt": prompt_doc})

    results = qa({"question": full_user_question})

    return results["answer"]
talhaanwarch commented 1 year ago

What do you mean by saving memory? Saving memory where?

samthedataman commented 1 year ago

Saving memory = saving chat history and using it to answer future questions....If you read this image from bottom up first i ask what the best treatments for borderline personlity disorder are, it gives correct results. Then when i ask it what symptoms do people have, it returns random symptoms for people relating to COVID-19 (the wrong disease in context) which reflects that memory is not being re-ingested into the models input prompts.

Screen Shot 2023-06-12 at 10 44 45 PM