Closed eatesaam-digital-diraction closed 1 month ago
@eatesaam-digital-diraction Hello there! I'm here to assist you with any issues, questions, or contributions you may have. Let's work together to resolve your concerns.
The ChatSummaryBuffer
did not summarize the chat history as expected in version 0.10.57 because the _summarize_oldest_chat_history
method only performs summarization if there is new information to be summarized. Specifically, if the chat_history_to_be_summarized
contains only one message and that message's role is MessageRole.SYSTEM
, the method returns that message without performing any summarization.
def _summarize_oldest_chat_history(
self, chat_history_to_be_summarized: List[ChatMessage]
) -> ChatMessage:
"""Use the llm to summarize the messages that do not fit into the
buffer.
"""
# Only summarize if there is new information to be summarized
if (
len(chat_history_to_be_summarized) == 1
and chat_history_to_be_summarized[0].role == MessageRole.SYSTEM
):
return chat_history_to_be_summarized[0]
summarize_prompt = ChatMessage(
role=MessageRole.SYSTEM,
content=self._get_prompt_to_summarize(chat_history_to_be_summarized),
)
# TODO: Maybe it is better to pass a list of history to llm
return ChatMessage(
role=MessageRole.SYSTEM,
content=self.llm.chat([summarize_prompt]).message.content,
)
This condition prevents the summarization process from occurring if the chat history to be summarized does not contain new information beyond a single system message [1].
Additionally, the ChatSummaryMemoryBuffer
aims to limit the chat history to a certain token length and iteratively summarize all messages that do not fit in the memory buffer. This is useful for limiting costs and latency, as the summarization prompt uses and generates fewer tokens than including the entire history. The newer ChatSummaryMemoryBuffer
provides more flexibility, allowing users to control which chat history is retained, unlike the original ChatMemoryBuffer
which simply truncates the history after a certain number of tokens [2][3].
To continue talking to Dosu, mention @dosu.
@eatesaam-digital-diraction in your example, the data you put into the memory doesn't exceed the token limit, so no summarization is done
@logan-markewich Thank you for your assistance and for highlighting the issue. I got your point.
Bug Description
ChatSummaryBuffer didn't summarize the chat history.
Version
0.10.57
Steps to Reproduce
import os
os.environ["OPENAI_API_KEY"] = "sk-"
from llama_index.core.memory import ChatSummaryMemoryBuffer from llama_index.core.llms import ChatMessage, MessageRole from llama_index.llms.openai import OpenAI import tiktoken
chat_history = [ ChatMessage(role=MessageRole.USER, content="What is LlamaIndex?"), ChatMessage( role=MessageRole.ASSISTANT, content="LlamaaIndex is the leading data framework for building LLM applications", ), ChatMessage(role=MessageRole.USER, content="Can you give me some more details?"), ChatMessage( role=MessageRole.ASSISTANT, content="""LlamaIndex is a framework for building context-augmented LLM applications. Context augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. Some popular use cases include the following: Question-Answering Chatbots (commonly referred to as RAG systems, which stands for "Retrieval-Augmented Generation"), Document Understanding and Extraction, Autonomous Agents that can perform research and take actions LlamaIndex provides the tools to build any of these above use cases from prototype to production. The tools allow you to both ingest/process this data and implement complex query workflows combining data access with LLM prompting.""", ), ]
model = "gpt-4-0125-preview" summarizer_llm = OpenAI(model_name=model, max_tokens=256) tokenizer_fn = tiktoken.encoding_for_model(model).encode memory = ChatSummaryMemoryBuffer.from_defaults( chat_history=chat_history, llm=summarizer_llm, token_limit=1000, tokenizer_fn=tokenizer_fn, )
history = memory.get()
print(history)
Relevant Logs/Tracbacks
No response