microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
28.17k stars 4.11k forks source link

[Bug]: MessageTokenLimiter ignored for output of Tools #2469

Open daanraman opened 2 months ago

daanraman commented 2 months ago

Describe the bug

MessageTokenLimiter works as expected for output not generated by Tools. However, when Tools are used, this appears not to be the case.

Steps to reproduce

Model Used

gpt-3.5-turbo

Expected Behavior

The truncated output should be sent to the GPT model, not the entire output, which triggers a rate limit

Screenshots and logs

In the screenshots, I show that even though the print statement mentions that the tokens were limited, it appears that the non-truncated output is sent to the GPT mode.

Output shows that the output of my tool is correctly being truncated image

This seems to be ignored when calling the LLM though, showing a rate limit image

Additional Information

No response

sonichi commented 2 months ago

Thanks. If you make a PR, please add @gagb and @WaelKarkoub as reviewers.

WaelKarkoub commented 2 months ago

Hi @daanraman, thanks for your feedback. I believe the accuracy of tool outputs is crucial, and truncating might omit valuable information. This issue might reflect a design decision rather than a bug but there are possible solutions. I'm working on a PR that applies LLM lingua for text compression, which might be a better fit for managing tool outputs but not certain of its effectiveness. We could also consider a new transform specifically for tool outputs that truncates differently, e.g. truncate from the middle, to preserve more context. Let me know what you think

daanraman commented 2 months ago

Hi @sonichi / @WaelKarkoub - thanks for the quick feedback, appreciated.

I understand the reasoning behind not truncating tool output. If that's a design choice though, I think it's confusing that in the output, it suggests that the tokens were truncated, while that doesn't seem to be the case.

The output of the tool itself should be feasible to fit into the context window of the LLM I am using - however, the main reason why I was trying to truncate the output of the Tool is to avoid filling up the history sent to the LLM with the large Tool output, which is not required for later steps to understand the context of the conversation (they should just base themselves on the output of the Agent that used the tool to come up with an answer).

So my question then is: 1) is Tool output of previous steps included in the history window in later steps in the conversation, or are these excluded? and 2) if the answer to the first question is that they are in fact included in later steps (and thus fill up the context window with Tool output), is using Nested Chats a way to "hide" the Tool output of previous steps ?

WaelKarkoub commented 2 months ago

@daanraman I see where the confusion lies now, the logs indicate truncation of message content without accounting for tool outputs. I'll open a PR to clarify the logging for this transform. Out of curiosity, are you applying the MessageTokenLimiter across all your agents?

1) I looked through the code base, and tool calls and tool responses are appended to the context window. https://github.com/microsoft/autogen/blob/b7366b570fac189c66d9642e65f431cd43632239/autogen/agentchat/conversable_agent.py#L558-L562

2) You can still run into the same issue if the nested chats generate large responses to each other.

If possible, consider creating a custom transform to extract essential information from tool outputs. This approach could add value to AutoGen. Would you be interested in collaborating on this?

daanraman commented 2 months ago

@WaelKarkoub thanks for the time & feedback. Correct, I was applying the MessageTokenLimiter to all agents. My approach for now is indeed to manually change my custom tools (which interact with API endpoints) and make them return more consise information to avoid overruning context windows.

Still very new to autogen (moved away from crew.ai yesterday) but am liking it very much so far - great documentation, examples, and in general the agents behave better by default I feel (better system prompts & history management). The group chat features are great too.

I will certainly consider contributing once I am a bit more familiar with both the framework & the codebase!