Unlimited context window on the agent consuming exorbitant amounts of tokens

langgenius / dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

https://dify.ai

Other

53.25k stars 7.77k forks source link

Unlimited context window on the agent consuming exorbitant amounts of tokens #8692

Closed juniorsereno closed 2 months ago

juniorsereno commented 2 months ago

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Create an agent, enter your prompt, tools, etc.
Start an interaction with him with several searches and long texts or not
After a high number of interactions, approximately 100 to 200 interactions with that same agent watch the token spend gradually increase until it reaches an exorbitant number...

Captura de tela 2024-09-23 121653

I have an agent that I use as a personal assistant that I disabled because I was spending 125 to 200 thousand tokens per response after 200 interactions.

I suggest that the community or the Devs put a context window on the Agent, otherwise it will only be used for testing and never for a real application because it will be unsustainable in the long run, forcing us to build chatflow to be able to have a context window and avoid this expense.

I love the ease of the Agent, but with this problem it becomes unfeasible to use it in the medium-long term...

✔️ Expected Behavior

A context window limit will be as expected.

❌ Actual Behavior

A gradual unsustainable expense after a few interactions (100 or more) with the agent occurs.

dosubot[bot] commented 2 months ago

@juniorsereno Hello there! I'm Dosu, a bot here to help you with bug fixes, questions, and contributions while you wait for a human maintainer.

I found a similar unsolved discussion regarding limiting the number of messages to control token usage ^[1].

To address the issue of exorbitant token usage after multiple interactions, you can implement a context window limit for the agent. This can be done by modifying the iteration limit of the agent in the agent settings. Specifically, you can adjust the max_iteration_steps parameter in the FunctionCallAgentRunner class to control the number of iterations the agent can perform, thereby managing token usage.

Here is an example of how you can set this limit:

class FunctionCallAgentRunner(BaseAgentRunner):
    def run(self, message: Message, query: str, **kwargs: Any) -> Generator[LLMResultChunk, None, None]:
        # Other method code

        iteration_step = 1
        max_iteration_steps = min(app_config.agent.max_iteration, 5) + 1  # Set the max iteration steps

        # Other method code

        while function_call_state and iteration_step <= max_iteration_steps:
            function_call_state = False

            if iteration_step == max_iteration_steps:
                # The last iteration, remove all tools
                prompt_messages_tools = []

            # Other loop code

            iteration_step += 1

        # Other method code

By setting max_iteration_steps to a desired value, you can limit the number of iterations and thus control the token usage effectively ^[2].

Additionally, token usage is monitored and controlled in other parts of the codebase, such as the CotAgentRunner and AgentChatAppRunner classes, where functions like increase_usage and get_pre_calculate_rest_tokens help manage token consumption ^[3]^[4].

To continue talking to Dosu, mention @dosu.

juniorsereno commented 2 months ago

@dosu How to make this modification with DIFY already installed, what is the directory of this file after installed on the VPS?

And how to make this modification with the installation of dify in Easypanel, do you have this information?

dosubot[bot] commented 2 months ago

To modify the max_iteration_steps parameter in the FunctionCallAgentRunner class for a Dify installation on a VPS and Easypanel, follow these steps:

For VPS Installation

Locate the Dify Source Code Directory: Navigate to the directory where you cloned the Dify source code. Typically, it would be something like:
```
cd /path/to/dify
```
Modify the FunctionCallAgentRunner Class: Open the fc_agent_runner.py file located in the api/core/agent directory:
```
nano api/core/agent/fc_agent_runner.py
```
Update the max_iteration_steps Parameter: Locate the following line in the run method:
```
max_iteration_steps = min(app_config.agent.max_iteration, 5) + 1
```
Change the value 5 to your desired maximum number of iteration steps. For example, to set it to 10:
```
max_iteration_steps = min(app_config.agent.max_iteration, 10) + 1
```
Restart the Docker Containers: Navigate to the Docker directory and restart the containers:
```
cd docker
docker compose down
docker compose up -d
```

For Easypanel Installation

Access the Easypanel Interface: Log in to your Easypanel dashboard and navigate to the Dify application.
Open the Terminal: Use the terminal feature in Easypanel to access the Dify container.
Modify the FunctionCallAgentRunner Class: Open the fc_agent_runner.py file located in the api/core/agent directory:
```
nano /path/to/dify/api/core/agent/fc_agent_runner.py
```
Update the max_iteration_steps Parameter: Locate and modify the max_iteration_steps parameter as described above.
Restart the Docker Containers: Use the Easypanel interface to restart the Dify application. This will apply the changes.

By following these steps, you can modify the max_iteration_steps parameter in both VPS and Easypanel installations to control the token usage effectively ^[1].

For more detailed instructions on deploying updates in Easypanel, you can refer to the Dify documentation ^[2]^[3].

To continue talking to Dosu, mention @dosu.

juniorsereno commented 2 months ago

@dosu The DIFY container in EasyPanel does not have a folder called API. Apparently the installation of EasyPanel is not possible to make this adjustment.

dosubot[bot] commented 2 months ago

To modify the max_iteration_steps parameter in a Dify installation on Easypanel, even if the container does not have an api folder, you can follow these steps:

Access the Container's File System: Use the terminal feature in Easypanel to access the Dify container.
Locate the fc_agent_runner.py File: The file might be located in a different directory. You can use the find command to locate it:
```
find / -name fc_agent_runner.py
```
Modify the max_iteration_steps Parameter: Once you locate the fc_agent_runner.py file, open it with a text editor like nano:
```
nano /path/to/fc_agent_runner.py
```
Locate the following line in the run method:
```
max_iteration_steps = min(app_config.agent.max_iteration, 5) + 1
```
Change the value 5 to your desired maximum number of iteration steps. For example, to set it to 10:
```
max_iteration_steps = min(app_config.agent.max_iteration, 10) + 1
```
Restart the Dify Application: Use the Easypanel interface to restart the Dify application. This will apply the changes.

If you still cannot find the fc_agent_runner.py file, you might need to check the Dify documentation or contact the support team for further assistance.

To continue talking to Dosu, mention @dosu.

juniorsereno commented 2 months ago

@dosu no easypanel tive esse erro no terminal:

bash: nano: command not found

dosubot[bot] commented 2 months ago

To modify the max_iteration_steps parameter in a Dify installation on Easypanel without using the nano command, you can use another text editor available on your system, such as vi or vim. Here are the steps:

Access the Container's File System: Use the terminal feature in Easypanel to access the Dify container.
Locate the fc_agent_runner.py File: If you haven't already located the file, you can use the find command:
```
find / -name fc_agent_runner.py
```
Open the File with vi or vim: Once you locate the fc_agent_runner.py file, open it with vi or vim:
```
vi /path/to/fc_agent_runner.py
```
Modify the max_iteration_steps Parameter: Locate the following line in the run method:
```
max_iteration_steps = min(app_config.agent.max_iteration, 5) + 1
```
Change the value 5 to your desired maximum number of iteration steps. For example, to set it to 10:
```
max_iteration_steps = min(app_config.agent.max_iteration, 10) + 1
```
Save the Changes and Exit the Editor: In vi or vim, you can do this by pressing Esc, typing :wq, and then pressing Enter.
Restart the Dify Application: Use the Easypanel interface to restart the Dify application. This will apply the changes.

This will update the max_iteration_steps parameter in your Dify installation ^[1].

To continue talking to Dosu, mention @dosu.

juniorsereno commented 2 months ago

@dosu I need to limit the agent's context window and not the amount of interactions with the tool, the context window is infinite and it's causing me an abusive token bug, I need to limit the context window so that it doesn't accumulate too many messages on the agent

dosubot[bot] commented 2 months ago

To limit the agent's context window and prevent it from accumulating too many messages, you can use the AgentHistoryPromptTransform class. This class includes logic to manage the number of messages based on token usage. Specifically, it calculates the maximum token limit and ensures that the accumulated messages do not exceed this limit.

Here is the relevant code snippet that handles this:

def get_prompt(self) -> list[PromptMessage]:
    prompt_messages = []
    num_system = 0
    for prompt_message in self.history_messages:
        if isinstance(prompt_message, SystemPromptMessage):
            prompt_messages.append(prompt_message)
            num_system += 1

    if not self.memory:
        return prompt_messages

    max_token_limit = self._calculate_rest_token(self.prompt_messages, self.model_config)

    model_type_instance = self.model_config.provider_model_bundle.model_type_instance
    model_type_instance = cast(LargeLanguageModel, model_type_instance)

    curr_message_tokens = model_type_instance.get_num_tokens(
        self.memory.model_instance.model, self.memory.model_instance.credentials, self.history_messages
    )
    if curr_message_tokens <= max_token_limit:
        return self.history_messages

    num_prompt = 0
    for prompt_message in self.history_messages[::-1]:
        if isinstance(prompt_message, SystemPromptMessage):
            continue
        prompt_messages.append(prompt_message)
        num_prompt += 1
        if isinstance(prompt_message, UserPromptMessage):
            curr_message_tokens = model_type_instance.get_num_tokens(
                self.memory.model_instance.model, self.memory.model_instance.credentials, prompt_messages
            )
            if curr_message_tokens > max_token_limit:
                prompt_messages = prompt_messages[:-num_prompt]
                break
            num_prompt = 0

    message_prompts = prompt_messages[num_system:]
    message_prompts.reverse()

    prompt_messages = prompt_messages[:num_system]
    prompt_messages.extend(message_prompts)
    return prompt_messages

This method ensures that the number of tokens used by the messages does not exceed the maximum token limit, effectively limiting the context window ^[1].

Additionally, the FunctionCallAgentRunner class in the Dify platform provides a comprehensive implementation for managing the agent's context and token usage. It includes methods to organize prompt messages, handle tool invocations, and manage the agent's scratchpad. This class ensures that the agent's context is efficiently managed, preventing excessive token usage.

class FunctionCallAgentRunner(BaseAgentRunner):
    # ... other methods ...

    def _organize_prompt_messages(self):
        prompt_template = self.app_config.prompt_template.simple_prompt_template or ""
        self.history_prompt_messages = self._init_system_message(prompt_template, self.history_prompt_messages)
        query_prompt_messages = self._organize_user_query(self.query, [])

        self.history_prompt_messages = AgentHistoryPromptTransform(
            model_config=self.model_config,
            prompt_messages=[*query_prompt_messages, *self._current_thoughts],
            history_messages=self.history_prompt_messages,
            memory=self.memory,
        ).get_prompt()

        prompt_messages = [*self.history_prompt_messages, *query_prompt_messages, *self._current_thoughts]
        if len(self._current_thoughts) != 0:
            # clear messages after the first iteration
            prompt_messages = self._clear_user_prompt_image_messages(prompt_messages)
        return prompt_messages

This implementation ensures that the agent's context window is effectively managed, preventing excessive token usage and maintaining optimal performance ^[2].

To continue talking to Dosu, mention @dosu.

juniorsereno commented 2 months ago

@dosu não entendi praticamente nada, só quero limitar a janela de contexto do agente em 20 mensagens, algum admin ainda irá averiguar minha questão?

dosubot[bot] commented 2 months ago

Para limitar a janela de contexto do agente a 20 mensagens, você pode ajustar a lógica na classe AgentHistoryPromptTransform. Aqui está um exemplo de como fazer isso:

class AgentHistoryPromptTransform(PromptTransform):
    # ... outros métodos e inicializações ...

    def get_prompt(self) -> list[PromptMessage]:
        prompt_messages = []
        num_system = 0
        for prompt_message in self.history_messages:
            if isinstance(prompt_message, SystemPromptMessage):
                prompt_messages.append(prompt_message)
                num_system += 1

        if not self.memory:
            return prompt_messages

        max_token_limit = 20  # Defina o limite máximo de tokens para 20

        model_type_instance = self.model_config.provider_model_bundle.model_type_instance
        model_type_instance = cast(LargeLanguageModel, model_type_instance)

        curr_message_tokens = model_type_instance.get_num_tokens(
            self.memory.model_instance.model, self.memory.model_instance.credentials, self.history_messages
        )
        if curr_message_tokens <= max_token_limit:
            return self.history_messages[:max_token_limit]  # Garanta que as mensagens do histórico não excedam o limite

        num_prompt = 0
        for prompt_message in self.history_messages[::-1]:
            if isinstance(prompt_message, SystemPromptMessage):
                continue
            prompt_messages.append(prompt_message)
            num_prompt += 1
            if isinstance(prompt_message, UserPromptMessage):
                curr_message_tokens = model_type_instance.get_num_tokens(
                    self.memory.model_instance.model, self.memory.model_instance.credentials, prompt_messages
                )
                if curr_message_tokens > max_token_limit:
                    prompt_messages = prompt_messages[:-num_prompt]
                    break
                num_prompt = 0

        message_prompts = prompt_messages[num_system:]
        message_prompts.reverse()

        prompt_messages = prompt_messages[:num_system]
        prompt_messages.extend(message_prompts)
        return prompt_messages[:max_token_limit]  # Garanta que as mensagens finais do prompt não excedam o limite

Este código define o max_token_limit para 20 e garante que o número de mensagens no resultado não exceda esse limite ^[1]^[2].

Se precisar de mais ajuda, um dos mantenedores pode dar uma olhada mais detalhada no seu caso. @takatost, poderia ajudar com essa questão?

To continue talking to Dosu, mention @dosu.

juniorsereno commented 2 months ago

Waiting for @takatost I would really like to put a limit on the agent's context window because it is the best and easiest agent creator I have ever used, I love the agent, but without a context window it becomes a madness of token consumption.

crazywoola commented 2 months ago

Not only the agent but also the basic chatbot have this problem, however we do not consider fixing this at this moment.