run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.37k stars 4.67k forks source link

[Question]: How to add memory to OpenAI Agent #12788

Closed JulianOestreich90 closed 2 months ago

JulianOestreich90 commented 2 months ago

Question Validation

Question

I am using an OpenAIAgent as a Top Level Agent, as shown in Multi Document Agents and i want to enable conversational chat features. How can i add the chat history (User Queries & Answers) to the Top Level Calls?

I tried to use ChatMemoryBufffer, but it seems that this is not the feature i am looking for, since it only saves the last call:

from llama_index.agent.openai import OpenAIAgent
from llama_index.storage.chat_store.redis import RedisChatStore
from llama_index.core.memory import ChatMemoryBuffer

chat_store = RedisChatStore(redis_url="redis://localhost:6379", ttl=30000)

chat_memory = ChatMemoryBuffer.from_defaults(
    token_limit=3000,
    chat_store=chat_store,
    chat_store_key="user1",
)

top_agent = OpenAIAgent.from_tools(
    tool_retriever=obj_index.as_retriever(similarity_top_k=10),
    memory=chat_memory,
    system_prompt=""" \
You are an agent designed to answer queries about a set of given documents.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True,
)
dosubot[bot] commented 2 months ago

To extend the chat history beyond the last call using ChatMemoryBuffer, you'll need to adjust its logic to manage entire calls as units for storage and retrieval, rather than individual messages. This involves defining what constitutes a "call" in your chat history, which could be a sequence of messages starting with a user query followed by the system's responses, until the next user query.

Here's a conceptual approach:

  1. Modify the get method to identify and manage calls within the chat history. This requires implementing a method to split the chat history into individual calls based on your application's logic.
  2. When trimming the chat history to meet the token limit, ensure entire calls are considered as units. This means removing the oldest calls first until the token limit is satisfied, rather than trimming individual messages.

This approach requires a clear definition of a "call" and might necessitate significant changes to how chat history is managed. Implementing _identify_calls is crucial, as it should split the chat history into calls according to your specific definition.

Remember, this is a high-level guide and implementing these changes will depend on your specific requirements and the current structure of your chat history management.

For more detailed guidance, you might need to refer to the ChatMemoryBuffer class's source code in the LlamaIndex codebase and adjust it according to the outlined approach.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 2 months ago

@JulianOestreich90 the chat memory buffer is indeed the feature you want, and it's used automatically under the hood. It works by always giving the agent the last X messages that fit under a token limit.

Try changing the token limit if your agent is filling up the buffer quickly (just be aware its possible to cause token overflow errors if too big)

chat_memory = ChatMemoryBuffer.from_defaults(
    token_limit=6000,
    chat_store=chat_store,
    chat_store_key="user1",
)

We have plenty of unit tests, it works fine. You can test it your self too

chat_memory.put(ChatMessage(role="user", content="test"))

# get the current buffer
messages = chat_memory.get()

# get all
messages = chat_memory.get_all()
JulianOestreich90 commented 2 months ago

@logan-markewich Yes, thank you, but that was not the problem. I think the main problem was, that i used the query() fn instead of the chat() fn. When using chat() it worked, but many other problem arised. One of the biggest problem was that ChatMemoryBuffer doesnt work well with theOpenAIAgent because it omit chat messages such as 'tool_calls' and the chat_history becomes malformed for the OpenAI API. I had to switch to an ReActAgent to solve this problem.

logan-markewich commented 2 months ago

@JulianOestreich90 I specifically fixed it to not remove tool calls 😓 But I see you are using the redis chat store. Maybe you did not have the latest versions of llama-index-core and llama-index-chat-stores-redis ?

JulianOestreich90 commented 2 months ago

@logan-markewich yes i saw your issue and that the pull request was merged, but it didn't worked with the latest version.

I was using llama-index 0.10.28 and llama-index-chat-store-redis 0.1.2

logan-markewich commented 2 months ago

Hmm, I just tried. Made a fix and it seemed to work. Will make a PR