microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
30.73k stars 4.48k forks source link

[Feature Request]: Implement a new `speaker_selection_method` called `fsm` for the `Groupchat` class #1400

Closed freedeaths closed 7 months ago

freedeaths commented 7 months ago

Is your feature request related to a problem? Please describe.

Last week, in an effort to utilize Augoten’s Groupchat for addressing production scenarios, I embarked on a series of experiments to evaluate the precision and dependability of the outcomes. Despite numerous tweaks to the prompts, encompassing the system_prompt and descriptions for each role, particularly the descriptions, I observed that the manager was still struggling to select the next speaker effectively as per the guidelines in the description.

To scrutinize the manager’s decision-making capability in choosing the next speaker in isolation, I devised a more streamlined experiment.

# pyautogen == 0.2.8

config_list = autogen.config_list_from_dotenv(
        dotenv_file_path='.env',
        model_api_key_map={'gpt-4':'OPENAI_API_KEY'},
        filter_dict={
            "model": {
                "gpt-4"
            }
        }
    )

gpt_config = {
    "cache_seed": None,
    "temperature": 0,
    "config_list": config_list,
    "timeout": 100,
}

engineer = autogen.AssistantAgent(
    name="Engineer",
    llm_config=gpt_config,
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    description="""I am **ONLY** allowed to speak **immediately** after `Planner`, `Critic` and `Executor`.
If the last number mentioned by `Critic` is not a multiple of 5, the next speaker must be `Engineer`.
"""
)
planner = autogen.AssistantAgent(
    name="Planner",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `User` or `Critic`.
If the last number mentioned by `Critic` is a multiple of 5, the next speaker must be `Planner`.
"""
)
executor = autogen.AssistantAgent(
    name="Executor",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
If the last number mentioned by `Engineer` is a multiple of 3, the next speaker can only be `Executor`.
"""
)
critic = autogen.AssistantAgent(
    name="Critic",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
If the last number mentioned by `Engineer` is not a multiple of 3, the next speaker can only be `Critic`.
"""
)
user_proxy = autogen.UserProxyAgent(
    name="User",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    code_execution_config=False,
    human_input_mode="NEVER",
    llm_config=False,
    description="""
Never select me as a speaker. 
"""
)

groupchat = autogen.GroupChat(
    agents=[engineer, planner, executor, critic, user_proxy], messages=[], max_round=25, allow_repeat_speaker=False
)

manager = autogen.GroupChatManager(
    groupchat=groupchat, 
    llm_config=gpt_config,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=False,
)

# Start the experiment
user_proxy.initiate_chat(
    manager,
    message="1",
    clear_history=True
)

The system messages from each agent outlined above delineate the task of sequential counting. To focus on assessing the manager’s capabilities, I’ve designed the task to be as straightforward as possible. Moreover, I’ve conducted tests using GPT-3.5 to verify whether a number under 30 is a multiple of 3 or 5, and found that the LLM performance is up to par.

The descriptions provided by each agent illustrate the state transitions as shown in the following diagram, essentially aiming to evaluate if the manager can select the next speaker in accordance with the finite state machine’s stipulations.

image

I ran the code locally using GPT-4-0613 and GPT-4-1106-preview 2 times and 4 times respectively. The results of all 6 runs were not as expected.

The output of GPT-4-0613 is:

User (to chat_manager):

1

--------------------------------------------------------------------------------
Planner (to chat_manager):

2

--------------------------------------------------------------------------------
Engineer (to chat_manager):

3

--------------------------------------------------------------------------------
Executor (to chat_manager):

4

--------------------------------------------------------------------------------
Engineer (to chat_manager):

5

--------------------------------------------------------------------------------
Planner (to chat_manager):

6

--------------------------------------------------------------------------------
Engineer (to chat_manager):

7

--------------------------------------------------------------------------------
Critic (to chat_manager):

8

--------------------------------------------------------------------------------
Engineer (to chat_manager):

9

--------------------------------------------------------------------------------
Executor (to chat_manager):

10

--------------------------------------------------------------------------------
Engineer (to chat_manager):

11

--------------------------------------------------------------------------------
Critic (to chat_manager):

12

--------------------------------------------------------------------------------
Engineer (to chat_manager):

13

--------------------------------------------------------------------------------
Critic (to chat_manager):

14

--------------------------------------------------------------------------------
Engineer (to chat_manager):

15

--------------------------------------------------------------------------------
Planner (to chat_manager):

16

--------------------------------------------------------------------------------
Engineer (to chat_manager):

17

--------------------------------------------------------------------------------
Critic (to chat_manager):

18

--------------------------------------------------------------------------------
Engineer (to chat_manager):

19

--------------------------------------------------------------------------------
Critic (to chat_manager):

20

--------------------------------------------------------------------------------
Planner (to chat_manager):

TERMINATE

--------------------------------------------------------------------------------

The output of GPT-4-1106-preview is:

User (to chat_manager):

1

--------------------------------------------------------------------------------
Planner (to chat_manager):

2

--------------------------------------------------------------------------------
Engineer (to chat_manager):

3

--------------------------------------------------------------------------------
Executor (to chat_manager):

4

--------------------------------------------------------------------------------
Critic (to chat_manager):

5

--------------------------------------------------------------------------------
Planner (to chat_manager):

6

--------------------------------------------------------------------------------
Engineer (to chat_manager):

7

--------------------------------------------------------------------------------
Critic (to chat_manager):

8

--------------------------------------------------------------------------------
Engineer (to chat_manager):

9

--------------------------------------------------------------------------------
Executor (to chat_manager):

10

--------------------------------------------------------------------------------
Critic (to chat_manager):

11

--------------------------------------------------------------------------------
Engineer (to chat_manager):

12

--------------------------------------------------------------------------------
Executor (to chat_manager):

13

--------------------------------------------------------------------------------
Critic (to chat_manager):

14

--------------------------------------------------------------------------------
Engineer (to chat_manager):

15

--------------------------------------------------------------------------------
Planner (to chat_manager):

16

--------------------------------------------------------------------------------
Engineer (to chat_manager):

17

--------------------------------------------------------------------------------
Critic (to chat_manager):

18

--------------------------------------------------------------------------------
Engineer (to chat_manager):

19

--------------------------------------------------------------------------------
Critic (to chat_manager):

20

--------------------------------------------------------------------------------
Planner (to chat_manager):

TERMINATE

--------------------------------------------------------------------------------

And then I found PR 857. So I implemented the following code:

# in GraphGroupChat branch

import matplotlib.pyplot as plt
import networkx as nx

from autogen.agentchat import Agent, GroupChat, AssistantAgent, UserProxyAgent, GroupChatManager
from autogen.oai.openai_utils import config_list_from_dotenv

config_list = config_list_from_dotenv(
        dotenv_file_path='.env',
        model_api_key_map={'gpt-4':'OPENAI_API_KEY'},
        filter_dict={
            "model": {
                "gpt-4"
            }
        }
    )

gpt_config = {
    "cache_seed": None,  # change the cache_seed for different trials
    "temperature": 0,
    "config_list": config_list,
    "timeout": 100,
}

engineer = AssistantAgent(
    name="Engineer",
    llm_config=gpt_config,
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    description="""I am **ONLY** allowed to speak **immediately** after `Planner`, `Critic` and `Executor`.
If the last number mentioned by `Critic` is not a multiple of 5, the next speaker must be `Engineer`.
"""
)
planner = AssistantAgent(
    name="Planner",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `User` or `Critic`.
If the last number mentioned by `Critic` is a multiple of 5, the next speaker must be `Planner`.
"""
)
executor = AssistantAgent(
    name="Executor",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("FINISH"),
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
If the last number mentioned by `Engineer` is a multiple of 3, the next speaker can only be `Executor`.
"""
)
critic = AssistantAgent(
    name="Critic",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
If the last number mentioned by `Engineer` is not a multiple of 3, the next speaker can only be `Critic`.
"""
)
user_proxy = UserProxyAgent(
    name="User",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    code_execution_config=False,
    human_input_mode="NEVER",
    llm_config=False,
    description="""
Never select me as a speaker. 
"""
)

agents = [user_proxy, engineer, planner, executor, critic]

graph_dict = {}
graph_dict["User"] = [planner]
graph_dict["Planner"] = [engineer]
graph_dict["Engineer"] = [critic, executor]
graph_dict["Critic"] = [engineer, planner]
graph_dict["Executor"] = [engineer]

# Visualization only

graph = nx.DiGraph()

# Add nodes
graph.add_nodes_from([agent.name for agent in agents])

# Add edges
for key, value in graph_dict.items():
    for agent in value:
        graph.add_edge(key, agent.name)

group_chat = GroupChat(agents=agents, messages=[], max_round=25, graph_dict=graph_dict, allow_repeat_speaker=False)

# Create the manager
manager = GroupChatManager(
    groupchat=group_chat, 
    llm_config=gpt_config,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=False,
)

user_proxy.initiate_chat(
    manager,
    message="1",
    clear_history=True
)

And I still get the wrong results from both GPT-4 and GPT-4-1106-preview.

The output of GPT-4-0613 is:

User (to chat_manager):

1

--------------------------------------------------------------------------------
Planner (to chat_manager):

2

--------------------------------------------------------------------------------
Engineer (to chat_manager):

3

--------------------------------------------------------------------------------
Executor (to chat_manager):

4

--------------------------------------------------------------------------------
Critic (to chat_manager):

5

--------------------------------------------------------------------------------
Planner (to chat_manager):

6

--------------------------------------------------------------------------------
Engineer (to chat_manager):

7

--------------------------------------------------------------------------------
Critic (to chat_manager):

8

--------------------------------------------------------------------------------
Engineer (to chat_manager):

9

--------------------------------------------------------------------------------
Executor (to chat_manager):

10

--------------------------------------------------------------------------------
Critic (to chat_manager):

11

--------------------------------------------------------------------------------
Engineer (to chat_manager):

12

--------------------------------------------------------------------------------
Executor (to chat_manager):

13

--------------------------------------------------------------------------------
Critic (to chat_manager):

14

--------------------------------------------------------------------------------
Engineer (to chat_manager):

15

--------------------------------------------------------------------------------
Planner (to chat_manager):

16

--------------------------------------------------------------------------------
Executor (to chat_manager):

17

--------------------------------------------------------------------------------
Critic (to chat_manager):

18

--------------------------------------------------------------------------------
Engineer (to chat_manager):

19

--------------------------------------------------------------------------------
Critic (to chat_manager):

20

--------------------------------------------------------------------------------
Planner (to chat_manager):

TERMINATE

The output of GPT-4-1106-preview is:

User (to chat_manager):

1

--------------------------------------------------------------------------------
Planner (to chat_manager):

2

--------------------------------------------------------------------------------
Engineer (to chat_manager):

3

--------------------------------------------------------------------------------
Executor (to chat_manager):

4

--------------------------------------------------------------------------------
Critic (to chat_manager):

5

--------------------------------------------------------------------------------
Planner (to chat_manager):

6

--------------------------------------------------------------------------------
Engineer (to chat_manager):

7

--------------------------------------------------------------------------------
Critic (to chat_manager):

8

--------------------------------------------------------------------------------
Engineer (to chat_manager):

9

--------------------------------------------------------------------------------
Executor (to chat_manager):

10

--------------------------------------------------------------------------------
Critic (to chat_manager):

11

--------------------------------------------------------------------------------
Engineer (to chat_manager):

12

--------------------------------------------------------------------------------
Executor (to chat_manager):

13

--------------------------------------------------------------------------------
Critic (to chat_manager):

14

--------------------------------------------------------------------------------
Engineer (to chat_manager):

15

--------------------------------------------------------------------------------
Executor (to chat_manager):

16

--------------------------------------------------------------------------------
Planner (to chat_manager):

17

--------------------------------------------------------------------------------
Engineer (to chat_manager):

18

--------------------------------------------------------------------------------
Executor (to chat_manager):

19

--------------------------------------------------------------------------------
Critic (to chat_manager):

20

--------------------------------------------------------------------------------
Planner (to chat_manager):

TERMINATE

Describe the solution you'd like

Therefore, I want to implement a new speaker_selection_method method, called fsm, which stands for "finite state machine".

The usage is to define a finite state machine fsm after initializing each Agent, and before initializing GroupChat, and then set speaker_selection_method="fsm" when initializing GroupChat, and introduce a new parameter finite_state_machine (finite_state_machine defaults to None, so it does not break the original design and usage)

The usage is like:

fsm = {
    "agents": [engineer, planner, executor, critic, user_proxy],
    "transitions": [
        {"from": user_proxy, "to": planner, "on": None},
        {"from": planner, "to": engineer, "on": None},
        {"from": engineer, "to": executor, "on": "If the last number mentioned by `Engineer` is a multiple of 3, the next speaker can only be `Executor`."},
        {"from": executor, "to": engineer, "on": None},
        {"from": engineer, "to": critic, "on": "If the last number mentioned by `Engineer` is not a multiple of 3, the next speaker can only be `Executor`."},
        {"from": critic, "to": engineer, "on": "If the last number mentioned by the Critic is not a multiple of 5, the next speaker can only be `Planner`."},
        {"from": critic, "to": planner, "on": "If the last number mentioned by the Critic is a multiple of 5, the next speaker can only be `Planner`."},
    ]
}
groupchat = autogen.GroupChat(
    agents=[engineer, planner, executor, critic, user_proxy], messages=[], max_round=25, allow_repeat_speaker=False, finite_state_machine=fsm, speaker_selection_method="fsm"
)

Additional context

I have implemented the method locally, and then repeated the experiment with gpt-4-1106-preview, with a success rate of 10/10 = 100%.

freedeaths commented 7 months ago

What I implemented now are as follows:

def fsm_select_speaker(self, last_speaker: Agent, agents: Optional[List[Agent]] = None) -> Union[Agent, None]:
        """Select the next speaker by pre-defined finite state machine."""
        if agents is None:
            agents = self.agents

        to_agents = [t["to"] for t in self.finite_state_machine['transitions'] if t['from'] == last_speaker]
        if len(to_agents) == 1:
            return to_agents[0], to_agents
        elif len(to_agents) == 0:
            logger.warning(
                f"There is no successor from {last_speaker}."
                "It is not expected and all the agents will be returned."
            )
            return None, agents
        else:
            # TODO: The case from last_speaker to last_speaker has not been handled yet
            if self.allow_repeat_speaker == True:
                to_agents.append(last_speaker)
        return None, to_agents
#select_speaker_messages = None
#if self.speaker_selection_method.lower() == "manual":
#    selected_agent = self.manual_select_speaker(agents)
#elif self.speaker_selection_method.lower() == "round_robin":
#    selected_agent = self.next_agent(last_speaker, agents)
#elif self.speaker_selection_method.lower() == "random":
#    selected_agent = random.choice(agents)
elif self.speaker_selection_method.lower() == "fsm":
    selected_agent, agents = self.fsm_select_speaker(last_speaker, agents)
    select_speaker_messages = self.messages.copy()
    if select_speaker_messages[-1].get("function_call", False):
        select_speaker_messages[-1] = dict(select_speaker_messages[-1], function_call=None)
    if select_speaker_messages[-1].get("tool_calls", False):
        select_speaker_messages[-1] = dict(select_speaker_messages[-1], tool_calls=None)
    select_speaker_messages = select_speaker_messages[-1:]
#else:
#    ...

In this implementation and experiment result, the FSM is quite successful. Moreover, it is a relatively universal organizational method for Groupchat, and it does not cause much breaking to the previous code.

So may I submit a PR to contribute the implementation method of this FSM?

victordibia commented 7 months ago

@freedeaths , thanks for the detailed description of the issue and your fsm approach. Looks interesting and I am sure it will benefit others. Tagging @afourney who has also been thinking in this direction.

afourney commented 7 months ago

Yes, there is overlap with the GraphGroupChat effort #857

freedeaths commented 7 months ago

Yes, there is overlap with the GraphGroupChat effort #857

Exactly. That's why I added experiments using the implementation in #857. The results were not expected too.

IANTHEREAL commented 7 months ago

Using an FSM (Finite State Machine) to describe the process is a greate approach.

freedeaths commented 7 months ago

As I mention here, I apologize if this has caused any confusion. The code status for the experiment I added using GraphGroupChat is not the latest state of the GraphGroupChat branch. In the latest state, there were no errors in the manager's selection, and the accuracy was 5/5 = 100%.

freedeaths commented 7 months ago

I will close this issue, because PR #857 covers the requirement.