microsoft / autogen

A programming framework for agentic AI πŸ€–
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
33.41k stars 4.84k forks source link

agent that both proposes and executes tools #2223

Open sonichi opened 7 months ago

sonichi commented 7 months ago

I like the concept of Autogen, I like to use a couple of features of it. But I currently just need a simple tool executor.

I notice that Autogen heavily works on the concept of generating code and executing it. Does it not has an simple way of executing code directly after it has been "selected"? - So basically same like OpenAI Function calling? Does it all use function calling?

I ask that, because for my use case it's sometimes enough to just select the correct tool. No need of extra checks etc.

Originally posted by @WebsheetPlugin in https://github.com/microsoft/autogen/discussions/2208

Suggestion: Create a reply function `propose_and_execute_tools_nested_reply" which uses a nested chat between an AssistantAgent and a UserProxyAgent with human_input_mode="NEVER".

cc @qingyun-wu

shippy commented 7 months ago

Doesn't this work already? I frequently use register_function with the same agent assigned to both caller and executor; the latter is a little misleadingly named, because such agent doesn't even need a code execution configuration to use said tool.

GeorgSatyros commented 7 months ago

It works in an unorthodox way. In your case the same agent would be selected to speak twice, given that it is the only one with access to the tool in question. So assuming the default groupchat flexibility in agent selection, it works. The moment you start dictating agent order, though, it will fail. I have implemented a workaround to this issue with Society Of Mind agents (a single agent that is composed of multiple under the hood). One SoM agent is called that under the hood calls a tool caller and then a tool executor, returning the result. I could make a PR with it, but I feel a more integrated solution to this would be preferable as SoM is experimental.

sonichi commented 7 months ago

@shippy it works in group chat while it takes two messages in the group chat for tool proposal and tool execution. @WebsheetPlugin probably wants to encapsulate tool proposal and execution inside one agent's inner conversation and only returns a single message of the result to the outer chat.

@GeorgSatyros thanks for sharing your experience. SoM Agent is experimental, while nested chat is in the core library. I suggest using nested chat to implement a new agent with the same functionality: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns#nested-chats It should be very easy to use a single register_nested_chat to implement it. Would you like to give it a try?

shippy commented 7 months ago

Odd: pretty sure all my use cases have been in multi-agent chats with defined transitions between agents, where the tool-executing agent wasn't allowed to speak twice (I think - I'll double-check)

GeorgSatyros commented 7 months ago

Odd: pretty sure all my use cases have been in multi-agent chats with defined transitions between agents, where the tool-executing agent wasn't allowed to speak twice (I think - I'll double-check)

@shippy I could have definitely elaborated more above, so let me fix that! Basically, assuming you are using the _speaker_selectionmethod parameter in groupchats and not overriding/shadowing the _speakerselection method itself, then the system should work. But the core of the problem is this: The system will actually, under the hood, not follow your defined transitions. If you want _agent_A->tool_agent_B->agentC, the actual order will be _agent_A->tool_agent_B->tool_agent_B->agentC. This could be a non-issue for many projects that have looser requirements on agent order, but for other projects it will be a problem. It is also a relatively unintuitive and inexplicit pattern, as is evident on how one may disallow agents speaking twice through the _allow_repeatspeaker flag and the system will ignore that when it comes to tool execution. Here's the relevant code snippet for the above as reference: snippet

@sonichi Sure, I was looking for an excuse to dive deeper into nested chats anyway! Is there any planned support for SoM going forward or will the "agent composed of agents" niche be fulfilled by nested chats? Asking because I was considering contributing to SoM and that effort may be better spent on the core component instead.

ChristianWeyer commented 7 months ago

@shippy it works in group chat while it takes two messages in the group chat for tool proposal and tool execution. @WebsheetPlugin probably wants to encapsulate tool proposal and execution inside one agent's inner conversation and only returns a single message of the result to the outer chat.

@GeorgSatyros thanks for sharing your experience. SoM Agent is experimental, while nested chat is in the core library. I suggest using nested chat to implement a new agent with the same functionality: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns#nested-chats It should be very easy to use a single register_nested_chat to implement it. Would you like to give it a try?

Would be great to have a sample for this πŸ‘πŸΌ.

ChristianWeyer commented 7 months ago

@shippy it works in group chat while it takes two messages in the group chat for tool proposal and tool execution. @WebsheetPlugin probably wants to encapsulate tool proposal and execution inside one agent's inner conversation and only returns a single message of the result to the outer chat. @GeorgSatyros thanks for sharing your experience. SoM Agent is experimental, while nested chat is in the core library. I suggest using nested chat to implement a new agent with the same functionality: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns#nested-chats It should be very easy to use a single register_nested_chat to implement it. Would you like to give it a try?

Would be great to have a sample for this πŸ‘πŸΌ.

I guess we already have it ☺️ https://microsoft.github.io/autogen/docs/notebooks/agentchat_nested_chats_chess

sonichi commented 6 months ago

@GeorgSatyros it'll be great if you could make a PR to reimplement the SoM agent using nested chat. It'll be easier to maintain. The current SoM agent can retire after feature parity.

GeorgSatyros commented 6 months ago

@sonichi I agree, that would be a more graceful solution than deprecation. Will be opening a PR with a solution as soon as my schedule allows!

WebsheetPlugin commented 6 months ago

Odd: pretty sure all my use cases have been in multi-agent chats with defined transitions between agents, where the tool-executing agent wasn't allowed to speak twice (I think - I'll double-check)

@shippy I could have definitely elaborated more above, so let me fix that! Basically, assuming you are using the _speaker_selectionmethod parameter in group chats and not overriding/shadowing the _speakerselection method itself, then the system should work. But the core of the problem is this: The system will actually, under the hood, not follow your defined transitions. If you want _agent_A->tool_agent_B->agentC, the actual order will be _agent_A->tool_agent_B->tool_agent_B->agentC. This could be a non-issue for many projects that have looser requirements on agent order, but for other projects, it will be a problem. It is also a relatively unintuitive and inexplicit pattern, as is evident in how one may disallow agents speaking twice through the _allow_repeatspeaker flag and the system will ignore that when it comes to tool execution. Here's the relevant code snippet for the above as reference: snippet

@sonichi Sure, I was looking for an excuse to dive deeper into nested chats anyway! Is there any planned support for SoM going forward or will the "agent composed of agents" niche be fulfilled by nested chats? Asking because I was considering contributing to SoM and that effort may be better spent on the core component instead.

Yes, I use speaker selection to decide the order of Agents. And for me, it seemed unintuitive to pass a round here just to execute the tool by the same agent again. As I had some logic to already "select" the next agent in order. So my solution was that if the last message is tool selection just allow the same agent again, which had the tool attached.

But again, for me, all this seemed unintuitive. And it's still not clear to me why another agent should execute the tool. Or why it should not be executed if selected.

Is there a case where you want to select a tool by Agent A and execute it by Agent B or C or not execute it at all?

sonichi commented 6 months ago

Odd: pretty sure all my use cases have been in multi-agent chats with defined transitions between agents, where the tool-executing agent wasn't allowed to speak twice (I think - I'll double-check)

@shippy I could have definitely elaborated more above, so let me fix that! Basically, assuming you are using the _speaker_selectionmethod parameter in group chats and not overriding/shadowing the _speakerselection method itself, then the system should work. But the core of the problem is this: The system will actually, under the hood, not follow your defined transitions. If you want _agent_A->tool_agent_B->agentC, the actual order will be _agent_A->tool_agent_B->tool_agent_B->agentC. This could be a non-issue for many projects that have looser requirements on agent order, but for other projects, it will be a problem. It is also a relatively unintuitive and inexplicit pattern, as is evident in how one may disallow agents speaking twice through the _allow_repeatspeaker flag and the system will ignore that when it comes to tool execution. Here's the relevant code snippet for the above as reference: snippet @sonichi Sure, I was looking for an excuse to dive deeper into nested chats anyway! Is there any planned support for SoM going forward or will the "agent composed of agents" niche be fulfilled by nested chats? Asking because I was considering contributing to SoM and that effort may be better spent on the core component instead.

Yes, I use speaker selection to decide the order of Agents. And for me, it seemed unintuitive to pass a round here just to execute the tool by the same agent again. As I had some logic to already "select" the next agent in order. So my solution was that if the last message is tool selection just allow the same agent again, which had the tool attached.

But again, for me, all this seemed unintuitive. And it's still not clear to me why another agent should execute the tool. Or why it should not be executed if selected.

Is there a case where you want to select a tool by Agent A and execute it by Agent B or C or not execute it at all?

For example, it makes it possible for Agent B to perform extra conversations with other agents or humans before executing.

WebsheetPlugin commented 6 months ago

Ahhh, now I got it. I am acutaly counting each token twice :) so this would be not my use case, but finaly I understand it. Makes sense.

Just an observation. I feel that Autogen is pivoted for use cases where many tokens are beings used, and it's kinda leaving out more simple use cases like the one mentioned above.

sonichi commented 6 months ago

Ahhh, now I got it. I am acutaly counting each token twice :) so this would be not my use case, but finaly I understand it. Makes sense.

Just an observation. I feel that Autogen is pivoted for use cases where many tokens are beings used, and it's kinda leaving out more simple use cases like the one mentioned above.

I think the self-executing agent based on nested chat is what you need. What do you think?

ChristianWeyer commented 6 months ago

Ahhh, now I got it. I am acutaly counting each token twice :) so this would be not my use case, but finaly I understand it. Makes sense. Just an observation. I feel that Autogen is pivoted for use cases where many tokens are beings used, and it's kinda leaving out more simple use cases like the one mentioned above.

I think the self-executing agent based on nested chat is what you need. What do you think?

Can you point us to docs or a sample for this @sonichi ? Thx.

ekzhu commented 6 months ago

@ChristianWeyer see https://microsoft.github.io/autogen/docs/notebooks/agentchat_nested_chats_chess

ChristianWeyer commented 6 months ago

@ChristianWeyer see https://microsoft.github.io/autogen/docs/notebooks/agentchat_nested_chats_chess

OK, the same I linked to above πŸ€“ - thx!

sonichi commented 6 months ago

Maybe we can make a special agent class that does self-execution using nested chat out of the box.

scruffynerf commented 4 months ago

by default, tools are sent onward to be processed. The alternative (non nested) is to catch the tools calls before send, and run them, add results back to the original message. Nest chat is cleaner, but not the minimal case.

The only negative is if you want the agent to loop: invoke tool, react to tool results, invoke tool, react to tool results... and I'd be cautious of doing that without a way for some other agent/user to have a say in there or risk runaway agent.

But 'invoke tool, add tool results to message that invoked them (clearing tool calls as processed from message so next agent won't redo them), and let someone else have a turn, that could the true minimal 'self-tool-executing' agent.

GeorgSatyros commented 4 months ago

@scruffynerf that is exactly what I'm trying to do at the moment. Quite easy if you are only interested in the tool output being within the "context" of the response. But adding and executing proper "tool_calls" into the message is much trickier as openai effectively requires 2 messages in the chat history per tool execution. As such you would need an agent that injects multiple messages in the conversation per call, and so they cannot be included in a "send". That is probably why a nested chat may be the more graceful solution in that case, where two agents are doing the execution under the hood instead. I am still working on a good, generic solution to this as it has been a quite persistent thorn in my team's side.

scruffynerf commented 4 months ago

that is exactly what I'm trying to do at the moment. Quite easy if you are only interested in the tool output being within the "context" of the response. But adding and executing proper "tool_calls" into the message is much trickier as openai effectively requires 2 messages in the chat history per tool execution. As such you would need an agent that injects multiple messages in the conversation per call, and so they cannot be included in a "send". That is probably why a nested chat may be the more graceful solution in that case, where two agents are doing the execution under the hood instead. I am still working on a good, generic solution to this as it has been a quite persistent thorn in my team's side.

The method I'm using for 'toolsfortoolless' (adding tool response processing in pre/post API for models/services that don't support tool_calls) can be used for this, if it fits your use case. Unsure.

https://github.com/microsoft/autogen/pull/2966#issuecomment-2177759814 is my flow of the parts (I haven't committed my PR yet, soon I hope, still tweaking it)

In it, I hook 'process_message_before_send' to add the tool_calls. You could do the same to process the tool calls, that is to self_execute them, and remove the tool_call so nobody else sees it. Yes, you then have multiple messages (the content if any, and results of tool calls), and then you can merge the results into ONE complete message (which I do in the fix_messages part of the flow, and which might need to just be self-standing to allow other LLMs in chat to handle the message flow in their templates).

While OpenAI api has trained in a 'tool call' OR 'content' behavior, they admit it's not actually a hardcoded OR, just trained in. Officially the spec does allows for both at once, I linked to a discussion saying so elsewhere. So you could even prompt it, to do something like this:

LLM chatting here, explaining and rambling on why and what it will do...
[tool_call id or other way to know where tool call results belong, generated by LLM itself]
and then when important thing happened, that date matters:
[Insert tool use: searchtool(argument'date of important thing')]
etc etc etc.

or maybe you want 2 known tool_calls merged: example prompt ' call the 'next free date' tool' to get an available date, and reserve it. also use tool that books catered lunches, but set the date for 1/1/1970'

and then you get two tool_calls, and process the next free date function, which returns {'next unscheduled day':"7/1/2024", 'Status': "now reserved"} and catch the second tool call, dump in the date, and THEN process it, and now you have a lunch delivery on 7/1/2024. You delete the tool_calls, and insert text into the final message before send to indicate the results

Next free date: 7/1/2024
Lunch booked for 7/1/2024, confirmation number XJ234

And THAT text would be what everyone including the calling LLM would see, and it would look like that LLM did as asked. Seamlessly self-executed.

That would have been at least 2-3 LLM calls: request for the asking to reserve date and book a lunch: reply to use reservation tool request back with date now reserved (tool process) reply back to use lunch tool with date (optional but likely) request back to confirm lunch is scheduled (tool process) reply back "Ok, on 7/1/2024, you are now booked and lunch will be delivered"

As I said, NOT ideal, but it might be a huge saving if you could reduce LLM calls in half, or more, for very fixed processes you can predict and merge together without the LLM managing it.

not ideal, the call/response method of even a nested chat with a LLM-less tool executor proxy allows the LLM to take the results and pretty it up, but yes, it's extra calls back and forth. If you don't need that, you can shortcircuit it.

scruffynerf commented 4 months ago

I may even add an option to add 'self-executing' for toolsfortoolless, just to see how it works.

scruffynerf commented 4 months ago

I may even add an option to add 'self-executing' for toolsfortoolless, just to see how it works.

Actually, mulling it over, making this a separate capability makes more sense. It's far easier to add 3-4 capabilities than have 1 with stuff you don't want, even if they all have flags to disable them

CallumMcMahon commented 4 months ago

I've been trying the state transitions feature and wanted to keep the graph as simple as possible, keeping possible transitions to 1 for much of the graph. My thoughts are

I came up with this solution

# where applicable, make the same agent able to both invoke and execute the same function
agent.register_for_execution()(function)
agent.register_for_llm()(function)

def state_transition(last_speaker: Agent, groupchat: GroupChat):
    messages = groupchat.messages
    if "tool_calls" in messages[-1]:
        called = messages[-1]["tool_calls"][0]["function"]["name"]
        if called in last_speaker.function_map:
            return last_speaker
    return "auto"

groupchat = GroupChat(
    ...
    allowed_or_disallowed_speaker_transitions=allowed_transitions,
    speaker_transitions_type="allowed",
    speaker_selection_method=state_transition,
)

It's

I'm new to autogen, but would be hesitant to try nested chats given the big jump in complexity managing state for web-apps. This solution works well for me, keeping complexity low on both function calling and managing messages. Let me know if I'm missing anything obvious! Thanks