run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.25k stars 4.95k forks source link

[Bug]: Llama-index ChatSummaryBuffer didn't summarise the chat history #14958

Closed eatesaam-digital-diraction closed 1 month ago

eatesaam-digital-diraction commented 1 month ago

Bug Description

ChatSummaryBuffer didn't summarize the chat history.

Version

0.10.57

Steps to Reproduce

import os

os.environ["OPENAI_API_KEY"] = "sk-"

from llama_index.core.memory import ChatSummaryMemoryBuffer from llama_index.core.llms import ChatMessage, MessageRole from llama_index.llms.openai import OpenAI import tiktoken

chat_history = [ ChatMessage(role=MessageRole.USER, content="What is LlamaIndex?"), ChatMessage( role=MessageRole.ASSISTANT, content="LlamaaIndex is the leading data framework for building LLM applications", ), ChatMessage(role=MessageRole.USER, content="Can you give me some more details?"), ChatMessage( role=MessageRole.ASSISTANT, content="""LlamaIndex is a framework for building context-augmented LLM applications. Context augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. Some popular use cases include the following: Question-Answering Chatbots (commonly referred to as RAG systems, which stands for "Retrieval-Augmented Generation"), Document Understanding and Extraction, Autonomous Agents that can perform research and take actions LlamaIndex provides the tools to build any of these above use cases from prototype to production. The tools allow you to both ingest/process this data and implement complex query workflows combining data access with LLM prompting.""", ), ]

model = "gpt-4-0125-preview" summarizer_llm = OpenAI(model_name=model, max_tokens=256) tokenizer_fn = tiktoken.encoding_for_model(model).encode memory = ChatSummaryMemoryBuffer.from_defaults( chat_history=chat_history, llm=summarizer_llm, token_limit=1000, tokenizer_fn=tokenizer_fn, )

history = memory.get()

print(history)

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

@eatesaam-digital-diraction Hello there! I'm here to assist you with any issues, questions, or contributions you may have. Let's work together to resolve your concerns.

The ChatSummaryBuffer did not summarize the chat history as expected in version 0.10.57 because the _summarize_oldest_chat_history method only performs summarization if there is new information to be summarized. Specifically, if the chat_history_to_be_summarized contains only one message and that message's role is MessageRole.SYSTEM, the method returns that message without performing any summarization.

def _summarize_oldest_chat_history(
        self, chat_history_to_be_summarized: List[ChatMessage]
    ) -> ChatMessage:
        """Use the llm to summarize the messages that do not fit into the
        buffer.
        """
        # Only summarize if there is new information to be summarized
        if (
            len(chat_history_to_be_summarized) == 1
            and chat_history_to_be_summarized[0].role == MessageRole.SYSTEM
        ):
            return chat_history_to_be_summarized[0]

        summarize_prompt = ChatMessage(
            role=MessageRole.SYSTEM,
            content=self._get_prompt_to_summarize(chat_history_to_be_summarized),
        )
        # TODO: Maybe it is better to pass a list of history to llm
        return ChatMessage(
            role=MessageRole.SYSTEM,
            content=self.llm.chat([summarize_prompt]).message.content,
        )

This condition prevents the summarization process from occurring if the chat history to be summarized does not contain new information beyond a single system message [1].

Additionally, the ChatSummaryMemoryBuffer aims to limit the chat history to a certain token length and iteratively summarize all messages that do not fit in the memory buffer. This is useful for limiting costs and latency, as the summarization prompt uses and generates fewer tokens than including the entire history. The newer ChatSummaryMemoryBuffer provides more flexibility, allowing users to control which chat history is retained, unlike the original ChatMemoryBuffer which simply truncates the history after a certain number of tokens [2][3].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

@eatesaam-digital-diraction in your example, the data you put into the memory doesn't exceed the token limit, so no summarization is done

eatesaam-digital-diraction commented 1 month ago

@logan-markewich Thank you for your assistance and for highlighting the issue. I got your point.