langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.29k stars 13.84k forks source link

Memory is not implemented correctly in non-indexed data #8748

Closed alisola21 closed 7 months ago

alisola21 commented 11 months ago

System Info

Who can help?

No response

Information

Related Components

Reproduction

I would like to report a problem i an experiencing using memory, in particular ConversationBufferMemory

I am developing a conversational agent capable of correctly answering very technical questions, contained in documentations. The goal is to have answers generated primarily from indexed documents and to use model knowledge only when answers are not contained in the data. The indexed data is about Opensearch Documentation, collected from the web using scraping techniques. Next, I created the embedding using Openai's embeddings and indexed the data in the vector store, following the instructions provided by the documentation,

Finally, I created the conversational agent using the ConversationalRetrievalChain which takes as input the prompt, the memory (ConversationBufferMemory) the model (gpt-3.5-turbo) and the retriever based on the indexed data.


# openai embeddings
embeddings = OpenAIEmbeddings() 

# vector store index
docsearch = OpenSearchVectorSearch(
            opensearch_url="https://admin:admin@localhost:9200",
            is_aoss=False,
            verify_certs = False,
            index_name = ["haystack_json", "opensearch_json"],
            embedding_function=embeddings)

#prompt
template = """Answer the question truthfully based mainly on the given documents. 
    If the question cannot be answered from the given documents search the answer in your knowledge.
    Use an unbiased and professional tone. Do not repeat text.

    Previous conversation:{chat_history}

    Documents:{context}
    Question:{question}
    Answer:
    """
QA = PromptTemplate(input_variables=["context", "question", 'chat_history'],template=template)

# memory, llm and chain 
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, streaming=True, callbacks=[StreamingStdOutCallbackHandler()])
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, input_key= "question", output_key='answer')
chain = load_qa_chain(llm, chain_type="stuff", memory = memory)

qa_chain = ConversationalRetrievalChain.from_llm(llm, retriever=docsearch.as_retriever(), condense_question_prompt=QA, memory = memory, return_source_documents=True)

# print results
user_input = input("Hello! Ask me what you want!")
result = qa_chain({"question": user_input})
print(result)

Expected behavior

Testing the code with questions about opensearch documentation, the results are correct and memory seems to work. In fact, the model can tell that the question "and with Tarball?" refers to the installation of Opensearch

Q: How to install Opensearch with Docker? 

A: To install OpenSearch with Docker, you can follow these steps:

1. Make sure Docker is installed and working correctly on your system.
2. Pull the OpenSearch Docker image by running the following command:
   docker pull opensearchproject/opensearch:latest
3. Create a Docker Compose file (docker-compose.yml) appropriate for your environment. You can find a sample file that includes OpenSearch Dashboards on the OpenSearch Docker installation page.
4. Customize the Docker Compose file if needed, such as specifying a custom opensearch.yml or opensearch_dashboards.yml.
5. Run the Docker Compose command to start the OpenSearch containers:
   docker-compose up

Q: and with Tarball? 

A: With the Tarball method, the installation of OpenSearch involves the following steps:

1. Download the OpenSearch tarball package.
2. Unpack the tarball to a desired location on your system.
3. Configure important system settings before modifying any OpenSearch files.
4. (Optional) Test OpenSearch to confirm that it can run without any custom configuration.
5. Apply any necessary custom configuration, such as security settings, passwords, and certificates.

However, when asked questions not related to the indexed data (e.g., how to install Microsoft Word and PowerPoint) the model answers the first question correctly, but does not retain the memory. In fact, it gives no instructions on installing PowerPoint and says it needs further clarification

The only way to get a correct answer is to rephrase the question similarly to the previous one (How to install Power Point?).

Q: How to install Word?
A: To install Microsoft Word, you can follow these steps:

1. Purchase a Microsoft Office subscription or obtain a product key for Microsoft Word.
2. Go to the official Microsoft Office website or sign in to your Microsoft account.
3. Locate the "Install Office" or "Install Word" option and click on it.
4. Choose the version of Microsoft Word you want to install (e.g., Office 365, Office 2019, etc.).
5. Follow the on-screen instructions to download the installer file.
7. Once the download is complete, run the installer file.
8. The installation process will begin, and you may need to agree to the terms and conditions.
9. Select the installation options you prefer (e.g., default settings, custom settings, etc.).
10. Wait for the installation to complete.
11. Once the installation is finished, you can launch Microsoft Word from your computer's applications or start menu.

Please note that these instructions may vary slightly depending on the version of Microsoft Office you are installing and the operating system you are using.

Q: and PowerPoint?
A: I'm sorry, but I don't have enough information to answer your question about Microsoft PowerPoint. Can you please provide more context or clarify your question?

Q: How to install Power Point?
A: To install Microsoft PowerPoint, you can follow these steps:

1. Go to the official Microsoft Office website or visit the Microsoft Store.
2. Choose the Office suite that includes PowerPoint (e.g., Office 365, Office 2019).
3. Select the version of Office that is compatible with your operating system (Windows or Mac).
4. Click on the "Buy" or "Get" button to start the purchase or download process.
5. Follow the on-screen instructions to complete the installation.
6. Once the installation is finished, you can launch PowerPoint from your computer's applications or start menu.

Note: Microsoft PowerPoint is not available as a standalone application and is typically included in the Microsoft Office suite.

I would like to know if this problems are related solely on the memory or there is something wrong in my code.

dosubot[bot] commented 8 months ago

Hi, @alisola21! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you reported an issue regarding the memory in the conversational agent not retaining information correctly when asked unrelated questions. It seems like you're unsure if this is a problem with the memory or your code. However, there hasn't been any activity or comments on the issue yet.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. Let us know if you have any further questions or concerns.

lfoppiano commented 7 months ago

I've solved this mess by re-implementing the whole thing which keep the standard load_qa_chain and by the chat history directly at the bottom of the context with the document retrieved. In this way I don't have to use a custom prompt (which will not work anymore if I change the chain mode to a value different than 'stuff').