Closed sonichi closed 3 months ago
I will handle this problem in microsoft/FLAML#1153. The problem should be in generate_reply, when it returns extra long messages. My current plan include the following functionalities:
@thinkall has implemented the tiktoken count in microsoft/FLAML#1158. Should I try to fix this concurrently?
I will handle this problem in microsoft/FLAML#1153. The problem should be in generate_reply, when it returns extra long messages. My current plan include the following functionalities:
- use tiktoken for more accurate count of tokens, add a static function that checks token left given model and previous messages.
- Allow user to pass in a predefined output limit.
- when the generated output (for example, from code execution) passes the max token allowed or the user predefined error, it will return a long result error.
@thinkall has implemented the tiktoken count in microsoft/FLAML#1158. Should I try to fix this concurrently?
Your proposal can solve part of the problem. It does the check on the sender's side in case the receiver requests a length limit. There can be other alternatives:
It'll be good to figure out what we want to support and have a comprehensive design. Could you discuss with @thinkall and @LeoLjl ? You are in the same time zone. Once you have a proposal, @qingyun-wu and I can go over it.
Sure, I will discuss will @thinkall and @LeoLjl about it.
I just updated microsoft/FLAML#1153 to allow user to set a pre-defined token limit for outputs from code or function call, this is a different task and I think is a different task from handling token_limit in oai_reply.
@sonichi @qingyun-wu Here is my proposed plan:
On AssistantAgent: Add a parameter on_token_limit from ["Terminate", "Compress" ]. We would if token limit is reached before oai.create is called, if set to "terminate", we would terminate the message. If set to compress, we would use a compress agent to compress previous messages and prepare for future conversations (we can also set a threshold like 80% of max token to start an async agent). I read that openai is doing summary for previous messages if it is too long.
On UserProxyAgent (I already added this in https://github.com/microsoft/FLAML/pull/1153): Allow user to specify the "auto_reply_token_limit". default to -1 (no limit). When auto_reply_token_limit > 0 and the token count from auto reply (code execution or function call) exceeds the limit, the output will be replaced with an error message. This can let users prevent unexpected cases where the output from code execution or functions calls overflowed.
From the two changes above, all 3 generate_reply cases are addressed: oai_reply, code execution and function call. I am thinking of general tasks like problem-solving. @BeibinLi likes the "compression" and "terminate" approach.
For tasks that involve databases and has a large consumption on tokens, like answer questions given a long text, or search for data in a database, I think we need special design targeting at those applications.
The proposal is a good start. I like the design that covers two options: deal with token limit after/before a reply is made. I think we can generalize this design:
On second thought, I don’t think we need to pass a token_limit argument. Currently for function and code execution, I use a class variable “auto_reply_token_limit” to customize behavior when limit is reached. When a new agent is overloading, they can employ this variable, or just create a new class variable.
Should the sender tell the receiver the token limit? "token_limit" and ways to handle token_limit should be separated. "token_limit" is a number that should be sent by the sender. Maybe we can make that a field in the message. The way to handle token_limit is decided in the auto reply method.
I have a few questions when looking at the
In receive
function, it calls generate reply without passing in messages: self.generate_reply(sender=sender)
, so the message
will be None. When registered methods such as generate_oai_reply
is called, message
will be None and it takes out the pre-stored messages:
if messages is None:
messages = self._oai_messages[sender]
It seems that this message argument is not used. When would this be used?
One possible usage: when generate_reply
is called individually.
the context
argument passed to register_auto_reply
seem more appropriate to be rename to reply_config
?
In oai_reply it is converted to llm_config
and in code execution it is converted to code_execution_config
. In other reply methods it is not used. It seems that "context" can be a field in message from oai and also "content" is a field in message.
I have a few questions when looking at the
- In
receive
function, it calls generate reply without passing in messages:self.generate_reply(sender=sender)
, so themessage
will be None. When registered methods such asgenerate_oai_reply
is called,message
will be None and it takes out the pre-stored messages:if messages is None: messages = self._oai_messages[sender]
It seems that this message argument is not used. When would this be used? One possible usage: when
generate_reply
is called individually.
- the
context
argument passed toregister_auto_reply
seem more appropriate to be rename toreply_config
? In oai_reply it is converted tollm_config
and in code execution it is converted tocode_execution_config
. In other reply methods it is not used. It seems that "context" can be a field in message from oai and also "content" is a field in message.
Good questions. Regarding 1, yes messages will be used when generate_reply
is called individually. We can revise the calling usage in receive
function to make it pass messages
, to avoid this confusion.
Regarding 2, we can rename it into config
if we want to avoid the confusion. One thing to note is that this variable could be updated in the reply function to maintain some state. I wanted to use it in other methods too but haven't done the refactoring. @ekzhu is it OK to rename context
into config
in generate_reply()
?
microsoft/FLAML#1098, microsoft/FLAML#1153, microsoft/FLAML#1158 each addresses this in some specialized way. Can we integrate these ideas into a generic solution and make
AssistantAgent
able to overcome this limitation out of the box?