microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
29.89k stars 4.35k forks source link

[Issue]: GroupChat with function calling and custom speaker selection does not work (i.e. cannot find the registered functions/tools associated with the agents) #2472

Open YagnaDeepika opened 4 months ago

YagnaDeepika commented 4 months ago

Describe the issue

I am implementing a multi-agent framework for Planner-Coder-CoderCritic-Executor-Admin agents using GroupChat. Each of these agents calls a tool (custom function) that I define for a specific code generation scenario. To have more control over the orchestration, I tried implementing the custom speaker selection method as outlined in this sample notebook. The difference between this sample notebook and my case is that I have to call custom functions with each of the agents.

I was able to get GroupChat with these agents (and tools) working by using the default speaker selection method. Since llm_config parameter in GroupChatManager cannot take tools or functions, I had to add the following code snippet to get the tools to work with GroupChatManager:

llm_config_manager = llm_config.copy()
llm_config_manager.pop("functions", None)
llm_config_manager.pop("tools", None)

However, when I add speaker_selection_method=custom_speaker_selection_func, I get the following error: ***** Response from calling tool (call_azvlwve54bGiqDDNjYn4nuUI) ***** Error: Function planner_function not found. Note that planner_function is the name of the function that I registered with the Planner agent as caller and Admin agent as executor (user proxy). It is not able to find any of the tools registered with the agents, when I introduce the custom_speaker_selection_func.

Steps to reproduce

I created a simple version of this issue below, with dummy Planner, Coder agents and an Admin agent.


import os
from dotenv import load_dotenv
from typing_extensions import Annotated
import autogen
from autogen import Agent

config_list = [
    {
        "model": "gpt4-turbo-1106-preview",
        "api_key": os.getenv("AZURE_OPENAI_API_KEY"),
        "api_type": "azure",
        "azure_endpoint": os.getenv("AZURE_OPENAI_API_BASE"),
        "api_version": os.getenv("AZURE_OPENAI_API_VERSION"),
    }
]

# Construct agents
llm_config = {"config_list": config_list, "cache_seed": 42}

planner = autogen.AssistantAgent(
    name="Planner",
    system_message="""You are an expert Planner. You suggest a detailed plan of action based on the user query.
You have access to the 'planner_function' tool. Once the plan is generated, move on to the Coder Agent.
""",
    llm_config=llm_config,
    # code_execution_config=False,
    description="""Agent that generates a detailed plan of action based on the user query.
The Planner agent has access to the 'planner_function' tool. Once the plan is generated, the Planner agent moves on to the Coder Agent.
""",
    is_termination_msg=lambda x: True if "TERMINATE" in x.get("content") else False,
)

coder = autogen.AssistantAgent(
    name="Coder",
    system_message="""You are an expert in writing Search queries by following the detailed plan generated by the Planner. You have access to the 'coder_function' tool.
    """,
    llm_config=llm_config,
    # code_execution_config=False,
    description="""Agent that writes Search queries based on the plan generated by the Planner.
The Coder agent has access to the 'coder_function' tool
""",
    is_termination_msg=lambda x: True if "TERMINATE" in x.get("content") else False,
)

admin = autogen.UserProxyAgent(
    name="Admin",
    system_message="""A human admin proxy. You interact with the system as a human user.
You reply 'TERMINATE' if the task is done.""",
    llm_config=llm_config,
    human_input_mode="NEVER",
    default_auto_reply="Reply `TERMINATE` if the task is done.",
    description="""Agent that is a proxy for a user. The Admin agent interacts with the system as a human user.
The Admin agent replies 'TERMINATE' if the task is done.""",
    is_termination_msg=lambda x: True if "TERMINATE" in x.get("content") else False,
    code_execution_config=False,
)

# Defining the agent tools

# Planner function
def planner_function(user_query: Annotated[str, "user's natural language query"]) -> dict:
    plan = {
        "dummy": f"dummy plan for {user_query}",
    }
    return plan

autogen.agentchat.register_function(
    planner_function,
    caller=planner,
    executor=admin,
    name="planner_function",
    description="Planner function that takes a user query and generates a plan of action to solve the task",
)

# Coder function
def coder_function(
    plan: Annotated[dict, "Planner output. This is the output of the Planner Agent's 'planner_function' tool"]
    ) -> dict:
    query = {
        "dummy": f"dummy query for {plan}",
    }
    return query

autogen.agentchat.register_function(
    coder_function,
    caller=coder,
    executor=admin,
    name="coder_function",
    description="Coder function that generates Search queries based on the planner output",
)

# Preparing llm config for GroupChatManager
llm_config_manager = llm_config.copy()
llm_config_manager.pop("functions", None)
llm_config_manager.pop("tools", None)

# Custom speaker selection function
def simplified_custom_speaker_selection_func(last_speaker: Agent, groupchat: autogen.GroupChat):
    """Define a customized speaker selection function.
    A recommended way is to define a transition for each speaker in the groupchat.

    Returns:
        Return an `Agent` class or a string from ['auto', 'manual', 'random', 'round_robin'] to select a default method to use.
    """
    messages = groupchat.messages

    if len(messages) <= 1:
        return planner

    if last_speaker is planner:
        return coder

    if last_speaker is coder:
        return admin

    if last_speaker is admin:
        if "TERMINATE" not in messages[-1]["content"]:
            if messages[-2]["name"] == "Planner":
                return coder
            elif messages[-2]["name"] == "Coder":
                return admin

# GroupChatManager
groupchat_with_intros = autogen.GroupChat(
    agents=[planner, coder, admin],
    messages=[],
    max_round=10,
    send_introductions=True,
    speaker_selection_method=simplified_custom_speaker_selection_func,
)
manager = autogen.GroupChatManager(groupchat=groupchat_with_intros, llm_config=llm_config_manager)

# Start the conversation
user_query = "Please generate a Search query to find the latest document for Record 1234"

result = admin.initiate_chat(
    manager,
    message=user_query,
)

### Screenshots and logs

_No response_

### Additional Information

- AutoGen Version: 0.2.26
- Operating System: Ubuntu 22.04 (WSL)
- Python Version: 3.10.13
ekzhu commented 4 months ago

I think the custom speaker selection does not automatically choose the tool executor agent to speaker next when the previous message is a tool call. cc @yiranwu0 @qingyun-wu

For now you can encode the tool call logic in the custom speaker selection function. But a longer term goal should be to perform the tool call of each agent in a nested chat.

yiranwu0 commented 4 months ago

Yes, you need to determine which one to call when write your function. For example, check for tool calls in the messages and make sure you always select 'admin' if a tool call is found.

Btw, is this error message from 'admin' or other agents? ***** Response from calling tool (call_azvlwve54bGiqDDNjYn4nuUI) ***** Error: Function planner_function not found.

If is from another agent that you are not registered as executor, you should revise your customize function. If it is from admin, can you double check the tool call is in admin's. _function_map ?

YagnaDeepika commented 4 months ago

Hi @yiranwu0 - thank you very much for your response. As you can see from the "steps to reproduce" section above, I have a planner_function tool that I register with planner agent as caller and admin agent as executor with the following code:

autogen.agentchat.register_function(
    planner_function,
    caller=planner,
    executor=admin,
    name="planner_function",
    description="Planner function that takes a user query and generates a plan of action to solve the task",
)

Similarly, I register the coder_function tool with coder agent as caller and admin agent as executor. The admin.function_map returns the following:

{'planner_function': <function main.planner_function(user_query: typing.Annotated[str, "user's natural language query"]) -> dict>, 'coder_function': <function main.coder_function(plan: typing.Annotated[dict, "Planner output. This is the output of the Planner Agent's 'planner_function' tool"]) -> dict>}

It seems that the tool calls are all correct.

@ekzhu - thank you very much for your response. Am I understanding correctly that tool use with custom speaker selection in group chat is currently not supported? As you can see from the example above, I have registered the tools with the appropriate agents so I'm trying to understand why this does not work. Incidentally, if I set speaker_selection_method = auto, the exact same code works perfectly and it is able to find all the registered tools. So I don't think there is any problem with the way the tools are registered. The issue seems to be something to do specifically with the speaker_selection_method=simplified_custom_speaker_selection_func as defined above in the "steps to reproduce" section.

Thanks again for your time and responses. I look forward to understanding what the reason for this issue is

ekzhu commented 4 months ago

@YagnaDeepika yes you are correct that currently the logic of choosing the next agent to execute tools must be encoded inside your custom speaker selection function. Otherwise, please use auto. You can also use constrained speaker selections which is compatible with auto.

YagnaDeepika commented 4 months ago

@ekzhu - thank you for your response. This is helpful. Can this "group chat with custom speaker selection with each agent using tools" support be submitted as a feature request?

ekzhu commented 4 months ago

This could be a useful feature. @joshkyh @marklysze @yiranwu0 @sonichi @freedeaths what do you think? I think the feature would be to bypass the custom speaker selection method when a tool call message is received.

YagnaDeepika commented 4 months ago

Hi @ekzhu - in the use case that I have, I would like more control over the speaker selection through the custom speaker selection function option, and I want each agent to use tools available to them in a group chat conversation pattern. So for this use case, it's key to retain the custom speaker selection option but add tool support to it (so I would not want to bypass the custom speaker selection method). I hope that makes sense. You can take a look at the "Steps to reproduce" section above for a simplified example of this pattern. Thanks!

yiranwu0 commented 4 months ago

@YagnaDeepika You can write

    if "tool_calls" in messages[-1]:
        return 'auto' # to go back to the "auto" selection

or

    if "tool_calls" in messages[-1]:
        return admin

The first one is to use "auto" if needed. The second one is to return the admin directly.

yiranwu0 commented 4 months ago

Basically, if you return a class "Agent" object, that will be taken as the next speaker. But you can still choose the default selection methods by returning a string.

joshkyh commented 4 months ago

This could be a useful feature. @joshkyh @marklysze @yiranwu0 @sonichi @freedeaths what do you think? I think the feature would be to bypass the custom speaker selection method when a tool call message is received.

Hmm, currently when a tool call message is received, fallback to auto is possible, but the user needs to specify it in the function (Yiran's reply). Making the fallback automatic sounds like a tradeoff between precise control (from the user specified function) and improved user experience (from fallback). Another perspective is that User specified function currently overrides graph constraints, and this is considering tool call fallback to override user specified function. Could be fine, just raising awareness.

ekzhu commented 4 months ago

So for this use case, it's key to retain the custom speaker selection option but add tool support to it (so I would not want to bypass the custom speaker selection method). I hope that makes sense.

@YagnaDeepika I am not suggesting bypassing the custom speaker selection method all the time. I meant to bypass custom speaker selection only when there is a tool call message -- effectively what @yiranwu0 suggested previously and move that code from your custom speaker selection method into the GroupChat itself.

Making the fallback automatic sounds like a tradeoff between precise control (from the user specified function) and improved user experience (from fallback).

@joshkyh @yiranwu0 Thanks for the inputs. Is there a case when we don't want to trigger tool execution when a tool call message is suggested?

I guess we can first document this better mentioning specifically the case of tool call messages.

joshkyh commented 4 months ago

@ekzhu I guess it's more of when the automatic fallback kicks in and overrides the custom speaker selection function, is it what the user wanted...? Suppose there are 3 agents each holding a different but overlapping set of tools, the custom selection function precisely returned agent A, but if we fallback automatically to auto triggered by the detection of tool_call, there is a risk that the llm picks agent b and c. Given the OP's initial question, perhaps if the tool is found in Agent A (returned by custom selection function) then disable automatic fallback?

On the other hand, I can also see the benefit of another use case where the user forgot to specify the tool call fallback, and the automatic fallback is useful. I guess this is a trade off.

marklysze commented 4 months ago

This could be a useful feature. @joshkyh @marklysze @yiranwu0 @sonichi @freedeaths what do you think? I think the feature would be to bypass the custom speaker selection method when a tool call message is received.

I'm not overly familiar with the exact steps taken when a tool call message is received, however this suggested feature sounds useful. I am, myself, not clear how to best control the agent to a received tool call message and having a simple way to do that would help. I'd be happy to test with non-Open AI models any changes here as there are challenges in mixing tool calling and non-tool calling agents in a group chat with them.

ekzhu commented 4 months ago

Great points. One potential way to do this is to remove the concept of tool call agents and replace it with agents that self-executing tools so that we don't have these problems.