Issue with DynamoDBChatMessageHistory Not Respecting max_token_limit in ConversationTokenBufferMemory

samuelbaruffi commented 9 months ago

System Info

Langchain Version: langchain 0.0.350, langchain-community 0.0.3, langchain-core 0.1.1 Python Version: 3.10.6 Operating System: macOs Additional Libraries: boto 2.49.0, boto3 1.34.1

Who can help?

No response

Information

[ ] The official example notebooks/scripts
[X] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[X] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

Steps to Reproduce:

Create an instance of DynamoDBChatMessageHistory with a specified table name and session ID.
Initialize ConversationTokenBufferMemory with a max_token_limit.
Attach the memory to a ConversationChain.
Call predict on the ConversationChain with some input.

Code sample:

import boto3
from langchain.llms import Bedrock
from langchain.memory.chat_message_histories import DynamoDBChatMessageHistory
from langchain.memory import ConversationTokenBufferMemory

session = boto3.Session(
    aws_access_key_id=os.environ.get('AWS_ACCESS_KEY_ID'),
    aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'),
    aws_session_token=os.environ.get('AWS_SESSION_TOKEN'),  
    region_name='us-east-1'
)

dynamodb = session.resource('dynamodb')
chat_sessions_table = dynamodb.Table('SessionTable')

boto3_bedrock = session.client(service_name="bedrock-runtime")

max_tokens_to_sample = 100
temperature = 0
modelId = "anthropic.claude-instant-v1"
top_k = 250
top_p = 0.999

model_kwargs = {
    "temperature": temperature,
    "max_tokens_to_sample": max_tokens_to_sample,
    "top_k": top_k,
    "top_p": top_p
}

llm = Bedrock(
    client=boto3_bedrock,
    model_id=modelId,
    region_name='us-east-1',
    model_kwargs=model_kwargs,
    streaming=True,callbacks=[StreamingStdOutCallbackHandler()]
    )

message_history = DynamoDBChatMessageHistory(table_name="SessionTable", session_id="10", boto3_session=session)

memory = ConversationTokenBufferMemory(
    llm=llm,  # Use the Bedrock instance
    max_token_limit=100,  
    return_messages=True,
    chat_memory=message_history,
    ai_prefix="A",
    human_prefix="H"
)

#add the memory to the Chain
conversation = ConversationChain(
    llm=llm, verbose=True, memory=memory
)

conversation.predict(input="Hello!")

memory.load_memory_variables({})

Expected behavior

Expected Behavior:

The DynamoDBChatMessageHistory should respect the max_token_limit set in ConversationTokenBufferMemory, limiting the token count accordingly.

Actual Behavior: The DynamoDBChatMessageHistory does not limit the token count as per the max_token_limit set in ConversationTokenBufferMemory and keeps saving all the items in memory on the DynamoDB table.

dosubot[bot] commented 9 months ago

🤖

Hello @samuelbaruffi!

I'm Dosu, a friendly bot here to lend a hand. Just think of me as your personal assistant in the world of LangChain while we wait for a human maintainer to join us. Let's sort out those bugs, answer your questions, and perhaps even help you contribute to the project. Let's make this a smooth and pleasant experience together!

Based on the information you've provided and the context from the LangChain repository, it seems that the issue you're experiencing is related to the DynamoDBChatMessageHistory class not respecting the max_token_limit set in ConversationTokenBufferMemory.

The ConversationTokenBufferMemory class does respect the max_token_limit as seen in the save_context method. It prunes the buffer if it exceeds the max_token_limit by popping out the oldest messages until the total number of tokens is within the limit. You can see this in the token_buffer.py file.

However, the DynamoDBChatMessageHistory class, which is responsible for saving the chat history to a DynamoDB table, might not have a similar pruning logic implemented. This could be the reason why it's not respecting the max_token_limit and continues to save all items in memory on the DynamoDB table.

The DynamoDBChatMessageHistory class in the LangChain codebase is a subclass of BaseChatMessageHistory and is designed to store chat message history in AWS DynamoDB. It has three main methods: messages, add_message, and clear. You can see this in the dynamodb.py file.

To resolve this issue, you might need to implement a similar pruning logic in the add_message method (or equivalent) of DynamoDBChatMessageHistory to ensure it respects the max_token_limit.

I also found a similar solved issue: ConversationTokenBufferMemory does not behave as expected. This issue was referenced by two pull requests: Update ConversationTokenBufferMemory to Enforce Token Limit in load_memory_variables and Resolve Buffer Clearing Issue in Summary Generation. These might provide some insights into how to resolve your issue.

Please let me know if this helps or if you need further assistance.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

keenborder786 commented 9 months ago

@samuelbaruffi The BaseChatMessageHistory will store the entire conversation without pruning the Human and AI messages. At runtime, the ConversationTokenBufferMemory will retrieve messages from BaseChatMessageHistory, which, in your case, is DynamoDB. Subsequently, it will prune the messages to the specified token limit in the prompt. Essentially, Memory and Storage are distinct components in Langchain. Memory determines how to construct the past context, while storage actually preserves the entire conversation history. In this scenario, the ConversationTokenBufferMemory prunes the messages obtained from DynamoDB at runtime.

samuelbaruffi commented 9 months ago

@keenborder786 thanks for the answer. In my tests that does not seem to be the case. I've tested multiple times and my Current conversation: that is the history of the chat keeps increasing without any limit of the tokens I set. In my example I have set max_token_limit=100 and I can see my history being set to the LLM as Current conversation: much bigger than 100 tokens.

Let me know if I could help providing my test scenarios.

954-Ivory commented 8 months ago

https://github.com/langchain-ai/langchain/blob/9e95699277fe0db3bfee1654276a43bfba9ecc64/libs/langchain/langchain/memory/token_buffer.py#L49-L59

The problem happens here:

https://github.com/langchain-ai/langchain/blob/9e95699277fe0db3bfee1654276a43bfba9ecc64/libs/langchain/langchain/memory/token_buffer.py#L58

buffer is just a list map from history storage. chat_memory : BaseChatMessageHistory isn't implemented the pop method.

samuelbaruffi commented 8 months ago

Thank you @954-Ivory. So are you saying it's not possible to use max_token_limit with DynamoDB as memory? Could you ellaborate a bit? I'd appreciate.

954-Ivory commented 8 months ago

Thank you @954-Ivory. So are you saying it's not possible to use max_token_limit with DynamoDB as memory? Could you ellaborate a bit? I'd appreciate.

Yes, I think it's a bug. At present, I am trying to implement it by monkey patch.

samuelbaruffi commented 8 months ago

Thank you! Once you have it working, could you paste the code for the workaround? Appreciate it!

954-Ivory commented 8 months ago

Thank you! Once you have it working, could you paste the code for the workaround? Appreciate it!

I used Redis to storage history. You can change it to DynamoDB.

https://github.com/langchain-ai/langchain/discussions/16527

langchain-ai / langchain