[Feature Request]: Implement a new `speaker_selection_method` called `fsm` for the `Groupchat` class

freedeaths commented 7 months ago

Is your feature request related to a problem? Please describe.

Last week, in an effort to utilize Augoten’s Groupchat for addressing production scenarios, I embarked on a series of experiments to evaluate the precision and dependability of the outcomes. Despite numerous tweaks to the prompts, encompassing the system_prompt and descriptions for each role, particularly the descriptions, I observed that the manager was still struggling to select the next speaker effectively as per the guidelines in the description.

To scrutinize the manager’s decision-making capability in choosing the next speaker in isolation, I devised a more streamlined experiment.

# pyautogen == 0.2.8

config_list = autogen.config_list_from_dotenv(
        dotenv_file_path='.env',
        model_api_key_map={'gpt-4':'OPENAI_API_KEY'},
        filter_dict={
            "model": {
                "gpt-4"
            }
        }
    )

gpt_config = {
    "cache_seed": None,
    "temperature": 0,
    "config_list": config_list,
    "timeout": 100,
}

engineer = autogen.AssistantAgent(
    name="Engineer",
    llm_config=gpt_config,
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    description="""I am **ONLY** allowed to speak **immediately** after `Planner`, `Critic` and `Executor`.
If the last number mentioned by `Critic` is not a multiple of 5, the next speaker must be `Engineer`.
"""
)
planner = autogen.AssistantAgent(
    name="Planner",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `User` or `Critic`.
If the last number mentioned by `Critic` is a multiple of 5, the next speaker must be `Planner`.
"""
)
executor = autogen.AssistantAgent(
    name="Executor",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
If the last number mentioned by `Engineer` is a multiple of 3, the next speaker can only be `Executor`.
"""
)
critic = autogen.AssistantAgent(
    name="Critic",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
If the last number mentioned by `Engineer` is not a multiple of 3, the next speaker can only be `Critic`.
"""
)
user_proxy = autogen.UserProxyAgent(
    name="User",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    code_execution_config=False,
    human_input_mode="NEVER",
    llm_config=False,
    description="""
Never select me as a speaker. 
"""
)

groupchat = autogen.GroupChat(
    agents=[engineer, planner, executor, critic, user_proxy], messages=[], max_round=25, allow_repeat_speaker=False
)

manager = autogen.GroupChatManager(
    groupchat=groupchat, 
    llm_config=gpt_config,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=False,
)

# Start the experiment
user_proxy.initiate_chat(
    manager,
    message="1",
    clear_history=True
)

The system messages from each agent outlined above delineate the task of sequential counting. To focus on assessing the manager’s capabilities, I’ve designed the task to be as straightforward as possible. Moreover, I’ve conducted tests using GPT-3.5 to verify whether a number under 30 is a multiple of 3 or 5, and found that the LLM performance is up to par.

The descriptions provided by each agent illustrate the state transitions as shown in the following diagram, essentially aiming to evaluate if the manager can select the next speaker in accordance with the finite state machine’s stipulations.

I ran the code locally using GPT-4-0613 and GPT-4-1106-preview 2 times and 4 times respectively. The results of all 6 runs were not as expected.

The output of GPT-4-0613 is:

User (to chat_manager):

1

--------------------------------------------------------------------------------
Planner (to chat_manager):

2

--------------------------------------------------------------------------------
Engineer (to chat_manager):

3

--------------------------------------------------------------------------------
Executor (to chat_manager):

4

--------------------------------------------------------------------------------
Engineer (to chat_manager):

5

--------------------------------------------------------------------------------
Planner (to chat_manager):

6

--------------------------------------------------------------------------------
Engineer (to chat_manager):

7

--------------------------------------------------------------------------------
Critic (to chat_manager):

8

--------------------------------------------------------------------------------
Engineer (to chat_manager):

9

--------------------------------------------------------------------------------
Executor (to chat_manager):

10

--------------------------------------------------------------------------------
Engineer (to chat_manager):

11

--------------------------------------------------------------------------------
Critic (to chat_manager):

12

--------------------------------------------------------------------------------
Engineer (to chat_manager):

13

--------------------------------------------------------------------------------
Critic (to chat_manager):

14

--------------------------------------------------------------------------------
Engineer (to chat_manager):

15

--------------------------------------------------------------------------------
Planner (to chat_manager):

16

--------------------------------------------------------------------------------
Engineer (to chat_manager):

17

--------------------------------------------------------------------------------
Critic (to chat_manager):

18

--------------------------------------------------------------------------------
Engineer (to chat_manager):

19

--------------------------------------------------------------------------------
Critic (to chat_manager):

20

--------------------------------------------------------------------------------
Planner (to chat_manager):

TERMINATE

--------------------------------------------------------------------------------

The output of GPT-4-1106-preview is:

User (to chat_manager):

1

--------------------------------------------------------------------------------
Planner (to chat_manager):

2

--------------------------------------------------------------------------------
Engineer (to chat_manager):

3

--------------------------------------------------------------------------------
Executor (to chat_manager):

4

--------------------------------------------------------------------------------
Critic (to chat_manager):

5

--------------------------------------------------------------------------------
Planner (to chat_manager):

6

--------------------------------------------------------------------------------
Engineer (to chat_manager):

7

--------------------------------------------------------------------------------
Critic (to chat_manager):

8

--------------------------------------------------------------------------------
Engineer (to chat_manager):

9

--------------------------------------------------------------------------------
Executor (to chat_manager):

10

--------------------------------------------------------------------------------
Critic (to chat_manager):

11

--------------------------------------------------------------------------------
Engineer (to chat_manager):

12

--------------------------------------------------------------------------------
Executor (to chat_manager):

13

--------------------------------------------------------------------------------
Critic (to chat_manager):

14

--------------------------------------------------------------------------------
Engineer (to chat_manager):

15

--------------------------------------------------------------------------------
Planner (to chat_manager):

16

--------------------------------------------------------------------------------
Engineer (to chat_manager):

17

--------------------------------------------------------------------------------
Critic (to chat_manager):

18

--------------------------------------------------------------------------------
Engineer (to chat_manager):

19

--------------------------------------------------------------------------------
Critic (to chat_manager):

20

--------------------------------------------------------------------------------
Planner (to chat_manager):

TERMINATE

--------------------------------------------------------------------------------

And then I found PR 857. So I implemented the following code:

# in GraphGroupChat branch

import matplotlib.pyplot as plt
import networkx as nx

from autogen.agentchat import Agent, GroupChat, AssistantAgent, UserProxyAgent, GroupChatManager
from autogen.oai.openai_utils import config_list_from_dotenv

config_list = config_list_from_dotenv(
        dotenv_file_path='.env',
        model_api_key_map={'gpt-4':'OPENAI_API_KEY'},
        filter_dict={
            "model": {
                "gpt-4"
            }
        }
    )

gpt_config = {
    "cache_seed": None,  # change the cache_seed for different trials
    "temperature": 0,
    "config_list": config_list,
    "timeout": 100,
}

engineer = AssistantAgent(
    name="Engineer",
    llm_config=gpt_config,
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    description="""I am **ONLY** allowed to speak **immediately** after `Planner`, `Critic` and `Executor`.
If the last number mentioned by `Critic` is not a multiple of 5, the next speaker must be `Engineer`.
"""
)
planner = AssistantAgent(
    name="Planner",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `User` or `Critic`.
If the last number mentioned by `Critic` is a multiple of 5, the next speaker must be `Planner`.
"""
)
executor = AssistantAgent(
    name="Executor",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("FINISH"),
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
If the last number mentioned by `Engineer` is a multiple of 3, the next speaker can only be `Executor`.
"""
)
critic = AssistantAgent(
    name="Critic",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    llm_config=gpt_config,
    description="""I am **ONLY** allowed to speak **immediately** after `Engineer`.
If the last number mentioned by `Engineer` is not a multiple of 3, the next speaker can only be `Critic`.
"""
)
user_proxy = UserProxyAgent(
    name="User",
    system_message="""Add 1 to the number output by the previous role. If the previous number is 20, output "TERMINATE".""",
    code_execution_config=False,
    human_input_mode="NEVER",
    llm_config=False,
    description="""
Never select me as a speaker. 
"""
)

agents = [user_proxy, engineer, planner, executor, critic]

graph_dict = {}
graph_dict["User"] = [planner]
graph_dict["Planner"] = [engineer]
graph_dict["Engineer"] = [critic, executor]
graph_dict["Critic"] = [engineer, planner]
graph_dict["Executor"] = [engineer]

# Visualization only

graph = nx.DiGraph()

# Add nodes
graph.add_nodes_from([agent.name for agent in agents])

# Add edges
for key, value in graph_dict.items():
    for agent in value:
        graph.add_edge(key, agent.name)

group_chat = GroupChat(agents=agents, messages=[], max_round=25, graph_dict=graph_dict, allow_repeat_speaker=False)

# Create the manager
manager = GroupChatManager(
    groupchat=group_chat, 
    llm_config=gpt_config,
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=False,
)

user_proxy.initiate_chat(
    manager,
    message="1",
    clear_history=True
)

And I still get the wrong results from both GPT-4 and GPT-4-1106-preview.

The output of GPT-4-0613 is:

User (to chat_manager):

1

--------------------------------------------------------------------------------
Planner (to chat_manager):

2

--------------------------------------------------------------------------------
Engineer (to chat_manager):

3

--------------------------------------------------------------------------------
Executor (to chat_manager):

4

--------------------------------------------------------------------------------
Critic (to chat_manager):

5

--------------------------------------------------------------------------------
Planner (to chat_manager):

6

--------------------------------------------------------------------------------
Engineer (to chat_manager):

7

--------------------------------------------------------------------------------
Critic (to chat_manager):

8

--------------------------------------------------------------------------------
Engineer (to chat_manager):

9

--------------------------------------------------------------------------------
Executor (to chat_manager):

10

--------------------------------------------------------------------------------
Critic (to chat_manager):

11

--------------------------------------------------------------------------------
Engineer (to chat_manager):

12

--------------------------------------------------------------------------------
Executor (to chat_manager):

13

--------------------------------------------------------------------------------
Critic (to chat_manager):

14

--------------------------------------------------------------------------------
Engineer (to chat_manager):

15

--------------------------------------------------------------------------------
Planner (to chat_manager):

16

--------------------------------------------------------------------------------
Executor (to chat_manager):

17

--------------------------------------------------------------------------------
Critic (to chat_manager):

18

--------------------------------------------------------------------------------
Engineer (to chat_manager):

19

--------------------------------------------------------------------------------
Critic (to chat_manager):

20

--------------------------------------------------------------------------------
Planner (to chat_manager):

TERMINATE

The output of GPT-4-1106-preview is:

User (to chat_manager):

1

--------------------------------------------------------------------------------
Planner (to chat_manager):

2

--------------------------------------------------------------------------------
Engineer (to chat_manager):

3

--------------------------------------------------------------------------------
Executor (to chat_manager):

4

--------------------------------------------------------------------------------
Critic (to chat_manager):

5

--------------------------------------------------------------------------------
Planner (to chat_manager):

6

--------------------------------------------------------------------------------
Engineer (to chat_manager):

7

--------------------------------------------------------------------------------
Critic (to chat_manager):

8

--------------------------------------------------------------------------------
Engineer (to chat_manager):

9

--------------------------------------------------------------------------------
Executor (to chat_manager):

10

--------------------------------------------------------------------------------
Critic (to chat_manager):

11

--------------------------------------------------------------------------------
Engineer (to chat_manager):

12

--------------------------------------------------------------------------------
Executor (to chat_manager):

13

--------------------------------------------------------------------------------
Critic (to chat_manager):

14

--------------------------------------------------------------------------------
Engineer (to chat_manager):

15

--------------------------------------------------------------------------------
Executor (to chat_manager):

16

--------------------------------------------------------------------------------
Planner (to chat_manager):

17

--------------------------------------------------------------------------------
Engineer (to chat_manager):

18

--------------------------------------------------------------------------------
Executor (to chat_manager):

19

--------------------------------------------------------------------------------
Critic (to chat_manager):

20

--------------------------------------------------------------------------------
Planner (to chat_manager):

TERMINATE

Describe the solution you'd like

Therefore, I want to implement a new speaker_selection_method method, called fsm, which stands for "finite state machine".

The usage is to define a finite state machine fsm after initializing each Agent, and before initializing GroupChat, and then set speaker_selection_method="fsm" when initializing GroupChat, and introduce a new parameter finite_state_machine (finite_state_machine defaults to None, so it does not break the original design and usage)

The usage is like:

fsm = {
    "agents": [engineer, planner, executor, critic, user_proxy],
    "transitions": [
        {"from": user_proxy, "to": planner, "on": None},
        {"from": planner, "to": engineer, "on": None},
        {"from": engineer, "to": executor, "on": "If the last number mentioned by `Engineer` is a multiple of 3, the next speaker can only be `Executor`."},
        {"from": executor, "to": engineer, "on": None},
        {"from": engineer, "to": critic, "on": "If the last number mentioned by `Engineer` is not a multiple of 3, the next speaker can only be `Executor`."},
        {"from": critic, "to": engineer, "on": "If the last number mentioned by the Critic is not a multiple of 5, the next speaker can only be `Planner`."},
        {"from": critic, "to": planner, "on": "If the last number mentioned by the Critic is a multiple of 5, the next speaker can only be `Planner`."},
    ]
}
groupchat = autogen.GroupChat(
    agents=[engineer, planner, executor, critic, user_proxy], messages=[], max_round=25, allow_repeat_speaker=False, finite_state_machine=fsm, speaker_selection_method="fsm"
)

Additional context

I have implemented the method locally, and then repeated the experiment with gpt-4-1106-preview, with a success rate of 10/10 = 100%.

freedeaths commented 7 months ago

What I implemented now are as follows:

Define the finite_state_machine like:

Now the on condition is not used.

{
"agents": [agent1, agent2, agent3],
"transitions": [
{"from": agent1, "to": agent2, "on": "event1"}, 
{"from": agent2, "to": agent3, "on": "event2"}
] 
}

Implement a new method fsm_select_speaker in GroupChat
- If there is only one available agent, return it as selected_agent to avoid LLM calling.
- Otherwise, let LLM pick up a reasonable agent.

def fsm_select_speaker(self, last_speaker: Agent, agents: Optional[List[Agent]] = None) -> Union[Agent, None]:
        """Select the next speaker by pre-defined finite state machine."""
        if agents is None:
            agents = self.agents

        to_agents = [t["to"] for t in self.finite_state_machine['transitions'] if t['from'] == last_speaker]
        if len(to_agents) == 1:
            return to_agents[0], to_agents
        elif len(to_agents) == 0:
            logger.warning(
                f"There is no successor from {last_speaker}."
                "It is not expected and all the agents will be returned."
            )
            return None, agents
        else:
            # TODO: The case from last_speaker to last_speaker has not been handled yet
            if self.allow_repeat_speaker == True:
                to_agents.append(last_speaker)
        return None, to_agents

If self.speaker_selection_method.lower() == "fsm" and self.finite_state_machine == None raise an Error
Implement the logic as follows. In a finite state machine, state transitions are only related to the previous state and are independent of any earlier states. Therefore, it is only necessary to consider the last message from the messages. There is no need to add an additional system prompt, which can effectively increase the likelihood of LLM making the correct selection.

#select_speaker_messages = None
#if self.speaker_selection_method.lower() == "manual":
#    selected_agent = self.manual_select_speaker(agents)
#elif self.speaker_selection_method.lower() == "round_robin":
#    selected_agent = self.next_agent(last_speaker, agents)
#elif self.speaker_selection_method.lower() == "random":
#    selected_agent = random.choice(agents)
elif self.speaker_selection_method.lower() == "fsm":
    selected_agent, agents = self.fsm_select_speaker(last_speaker, agents)
    select_speaker_messages = self.messages.copy()
    if select_speaker_messages[-1].get("function_call", False):
        select_speaker_messages[-1] = dict(select_speaker_messages[-1], function_call=None)
    if select_speaker_messages[-1].get("tool_calls", False):
        select_speaker_messages[-1] = dict(select_speaker_messages[-1], tool_calls=None)
    select_speaker_messages = select_speaker_messages[-1:]
#else:
#    ...

In this implementation and experiment result, the FSM is quite successful. Moreover, it is a relatively universal organizational method for Groupchat, and it does not cause much breaking to the previous code.

So may I submit a PR to contribute the implementation method of this FSM?

victordibia commented 7 months ago

@freedeaths , thanks for the detailed description of the issue and your fsm approach. Looks interesting and I am sure it will benefit others. Tagging @afourney who has also been thinking in this direction.

afourney commented 7 months ago

Yes, there is overlap with the GraphGroupChat effort #857

freedeaths commented 7 months ago

Yes, there is overlap with the GraphGroupChat effort #857

Exactly. That's why I added experiments using the implementation in #857. The results were not expected too.

IANTHEREAL commented 7 months ago

Using an FSM (Finite State Machine) to describe the process is a greate approach.

freedeaths commented 7 months ago

As I mention here, I apologize if this has caused any confusion. The code status for the experiment I added using GraphGroupChat is not the latest state of the GraphGroupChat branch. In the latest state, there were no errors in the manager's selection, and the accuracy was 5/5 = 100%.

freedeaths commented 7 months ago

I will close this issue, because PR #857 covers the requirement.

microsoft / autogen