run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.38k stars 4.98k forks source link

[Question]: Want to reduce the context memory token size #15320

Open avisheka93 opened 1 month ago

avisheka93 commented 1 month ago

Question Validation

Question

i was using 3900 tokens before while using chatmemorybuffer from llamaindex

facing issues with token limits with open ai

so changed the tokens to 2000

chat_memory = ChatMemoryBuffer.from_defaults( token_limit=2000, chat_store=chat_store,

chat_store_key=create_key()

)

getting below error

ValueError: Initial token count exceeds token limit

dosubot[bot] commented 1 month ago

To resolve the "ValueError: Initial token count exceeds token limit" error after reducing the token limit to 2000 in ChatMemoryBuffer, you need to ensure that the initial token count passed to the get method does not exceed the new token limit. Here is an example of how you can adjust the initial token count accordingly:

from llama_index.core.base.llms.types import ChatMessage, MessageRole
from llama_index.core.memory.chat_memory_buffer import ChatMemoryBuffer
from llama_index.core.utils import get_tokenizer

# Assuming you have reduced the token limit to 2000
token_limit = 2000
tokenizer = get_tokenizer()

# Create a ChatMemoryBuffer with the new token limit
USER_CHAT_MESSAGE = ChatMessage(role=MessageRole.USER, content="first message")
memory = ChatMemoryBuffer.from_defaults(token_limit=token_limit, chat_history=[USER_CHAT_MESSAGE])

# Ensure the initial token count does not exceed the token limit
initial_token_count = min(token_limit, len(tokenizer(str(USER_CHAT_MESSAGE.content))))

# Get the chat history from the memory
history = memory.get(initial_token_count=initial_token_count)

# The history should contain the message if the initial token count is within the limit
assert len(history) == 1
assert history[0].content == USER_CHAT_MESSAGE.content

This code ensures that the initial token count passed to the get method is within the new token limit, thereby avoiding the "ValueError: Initial token count exceeds token limit" error. Additionally, the ChatMemoryBuffer class has built-in validation to ensure the token limit is set correctly and provides methods to manage chat history efficiently [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.

avisheka93 commented 1 month ago

@dosu im using redis chatstore as memory

def redis_chat(redis_client): chat_store = RedisChatStore(redis_client=redis_client)

return chat_store

chat_store = redis_chat(redis_client)

chat_memory = ChatMemoryBuffer.from_defaults( token_limit=2000, chat_store=chat_store,

chat_store_key=create_key()

)

dosubot[bot] commented 1 month ago

To resolve the "ValueError: Initial token count exceeds token limit" error while using RedisChatStore as the chat store, you need to ensure that the initial token count does not exceed the token limit set in your ChatMemoryBuffer.

Here's how you can handle this:

  1. Check the initial token count before getting the chat history:

    from llama_index.core.memory.chat_memory_buffer import ChatMemoryBuffer
    from llama_index.core.storage.chat_store.redis_chat_store import RedisChatStore
    
    def redis_chat(redis_client):
        chat_store = RedisChatStore(redis_client=redis_client)
        return chat_store
    
    redis_client = ...  # Your Redis client initialization
    chat_store = redis_chat(redis_client)
    
    token_limit = 2000
    initial_tokens = 50  # Example initial token count
    
    if initial_tokens > token_limit:
        raise ValueError("Initial token count exceeds token limit")
    
    chat_memory = ChatMemoryBuffer.from_defaults(
        token_limit=token_limit,
        chat_store=chat_store,
        # chat_store_key=create_key()
    )
    
    history = chat_memory.get(initial_token_count=initial_tokens)
  2. Adjust the initial token count or token limit accordingly:

    initial_tokens = 50  # Example initial token count
    token_limit = 2000  # Your token limit
    
    # Ensure initial tokens do not exceed the token limit
    if initial_tokens > token_limit:
        initial_tokens = token_limit
    
    chat_memory = ChatMemoryBuffer.from_defaults(
        token_limit=token_limit,
        chat_store=chat_store,
        # chat_store_key=create_key()
    )
    
    history = chat_memory.get(initial_token_count=initial_tokens)

This ensures that the initial token count passed to the get method is within the token limit, thereby avoiding the "ValueError: Initial token count exceeds token limit" error [1][2].

To continue talking to Dosu, mention @dosu.

avisheka93 commented 1 month ago

@dosu how to integrate history param

initial_tokens = 50 # Example initial token count token_limit = 2000 # Your token limit

Ensure initial tokens do not exceed the token limit

if initial_tokens > token_limit: initial_tokens = token_limit

chat_memory = ChatMemoryBuffer.from_defaults( token_limit=token_limit, chat_store=chat_store,

chat_store_key=create_key()

)

history = chat_memory.get(initial_token_count=initial_tokens)

in chat engine:

def redis_chat(redis_client): chat_store = RedisChatStore(redis_client=redis_client)

return chat_store

chat_store = redis_chat(redis_client)

chat_memory = ChatMemoryBuffer.from_defaults( token_limit=3900, chat_store=chat_store,

chat_store_key=create_key()

)

chat_engine = index.as_chat_engine( chat_mode="condense_plus_context", memory=chat_memory, llm=llm, context_prompt=ContextPrompt, condense_prompt= CondensePrompt,

streaming = True,

verbose=True,

)

dosubot[bot] commented 1 month ago

To integrate the history parameter into the chat engine setup, you need to ensure that the chat history is properly managed and utilized within the engine. Here's how you can do it:

  1. Initialize the ChatMemoryBuffer with the history parameter:

    from llama_index.core.memory.chat_memory_buffer import ChatMemoryBuffer
    from llama_index.core.storage.chat_store.redis_chat_store import RedisChatStore
    from llama_index.core.llms.types import ChatMessage, MessageRole
    
    def redis_chat(redis_client):
        chat_store = RedisChatStore(redis_client=redis_client)
        return chat_store
    
    redis_client = ...  # Your Redis client initialization
    chat_store = redis_chat(redis_client)
    
    # Example chat history
    chat_history = [
        ChatMessage(role=MessageRole.USER, content="Hello, how can I help you?"),
        ChatMessage(role=MessageRole.ASSISTANT, content="I need some information about my order.")
    ]
    
    token_limit = 2000
    chat_memory = ChatMemoryBuffer.from_defaults(
        token_limit=token_limit,
        chat_store=chat_store,
        chat_history=chat_history  # Pass the chat history here
    )
  2. Integrate the ChatMemoryBuffer into the chat engine:

    from llama_index.core.chat_engine.condense_plus_context import CondensePlusContextChatEngine
    from llama_index.core.llms import YourLLM  # Replace with your LLM class
    from llama_index.core.prompts import ContextPrompt, CondensePrompt
    
    llm = YourLLM(...)  # Initialize your LLM
    
    chat_engine = CondensePlusContextChatEngine(
        memory=chat_memory,
        llm=llm,
        context_prompt=ContextPrompt,
        condense_prompt=CondensePrompt,
        verbose=True
    )
    
    # Now you can use the chat engine with the provided chat history
    response = chat_engine.chat("Can you tell me the status of my order?")
    print(response.response)

By passing the chat_history parameter to the ChatMemoryBuffer and integrating it into the chat engine, you ensure that the chat history is properly managed and utilized within the engine [1][2][3][4][5].

To continue talking to Dosu, mention @dosu.