microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
31.54k stars 4.59k forks source link

Finding solution for select_speaker and agent_by_name functions: Sterilization or ?? #489

Closed robzsaunders closed 1 month ago

robzsaunders commented 11 months ago

Hey everyone,

Was doing some probing into why the group manager just fails to do its job and have some questions.

1) on lines 153 to 156, why are we broadcasting the message to all agents?

https://github.com/microsoft/autogen/blob/b432c1b10823559189930986c0b668cc2548d34c/autogen/agentchat/groupchat.py#L136-L179

Instead of just sending the message to the selected speaker?

many server queue incoming messages. Personally I've been using LM Studio and noticed that it notes "running queued message" and runs them one by one, which may be causing some of these Local LLM group chat complications despite hacks to force chat orders.

robzsaunders commented 11 months ago

I'm currently checking to see if theres better performance by commenting out lines 154,155 and 156

https://github.com/microsoft/autogen/blob/b432c1b10823559189930986c0b668cc2548d34c/autogen/agentchat/groupchat.py#L153-L156

and adding this between 162 and 164 self.send(message, speaker, request_reply=False, silent=True)

afourney commented 11 months ago

Group Chat is designed to mimic ... well a group chat (e.g., on your phone or in slack). We expect every agent to be aware of the shared conversation up to that point. Other workflows are certainly possible (Like spoke and hub delegation), but they wouldn't be called group chat. Commenting out the broadcast will likely break many of the Group Chat demo scenarios.

When you say that you were investigating why the Group Chat Manager "fails to do its job", can you be more specific? What failures were you observing?

robzsaunders commented 11 months ago

Hey @afourney, thanks for the explanation. Makes sense!

Apologies for the long write up but I think I found the source of a lot of local LLM's group chat problems. Since there's no error outputs for when there's failures, no one knew that the responses from the manager were incorrect.

Taking a look through the issue board and discord, there seems to be a common theme over the last week or so of the group chat manager not working in the "correct order" or "selecting agents correctly".

After doing some debugging using a few local LLM's I think I found the issue. It is a pair of problems that are related to one another.


Problem 1

Example: Agent being called is named "Coder"

Similar to my ticket #399 where we needed to sterilize the input of raw code, the raw output of the manager's role message is not in the correct format for the agent_by_name function in groupchat.py.

The below is called from the end of select_speaker function.

https://github.com/microsoft/autogen/blob/b432c1b10823559189930986c0b668cc2548d34c/autogen/agentchat/groupchat.py#L40-L42

Example outputs from the manager:

The major offense though which I repeatedly keep seeing is the manager respond

What that function expects

This causes the ValueError to trigger on line 106 causing pseudo correct looking functionality. My belief is that because the ValueError spillover calls next, the manager seems to be working because I suspect most people order their agents in the chatgroup in the logical order of operations.


Rolling into problem 2

For some reason, I'm not quite sure why, the manager is semi-ignoring the system prompts fed to it from select_speaker_msg and line 96 in groupchat.py.

Now I don't know for sure how it works yet 100%, but my intuition is that the self.messages on line 96 needs to be a user message and the prompt from the user needs to be a system message for the manager.

Before I finished up today, the last thing I did was swap line 95 from being a system tag to a user tag, and the manager stopped writing out full blocks of code.

It didn't output the correct single word response of "Coder" but it went "I will choose the coder" or something like that. So there is something weird going on with how the local LLM's interact with the group chat manager prompts.

I'm not sure the best way to approach this, but I think this is a another roadblock for most local LLM users and will help with the group chat problems they're facing.


Current behavior

4 agents, Manager, User_Proxy, Coder, QA

  1. User proxy "Hey write me a basic hello world in python" (to manager)

  2. System
    [ "Choose a role, only choose one role" ] (to manager)

  3. Manager
    "I choose Coder: '''Python (insert python code here)"

  4. autogen logic
    [ Manager has finished and I got it's response. Checking message. Result: ValueError fail. Last agent: Coder. Next agent is... Coder ]

  5. Coder
    " '''Python (insert python code with incorrect code here) " (replies to autogen logic)

  6. autogen logic
    [ coder is done, manager picks another speaker]

  7. System
    [ "Choose a role, only choose one role" ] (to manager)

  8. Manager
    "I choose QA: "(Does full QA assessment that detects the incorrect code)

  9. autogen logic
    [ Manager has finished and I got it's response. Checking message. Result: ValueError fail. Last agent: Coder. Next agent is... QA']

  10. QA
    [ Does full assessement that detects the incorrect code] (replies to autogen logic)

  11. autogen logic
    [ QA is done, manager picks another speaker]

  12. System
    [ "Choose a role, only choose one role" ] (to manager)

  13. Manager
    "I choose Coder: '''Python (insert python code here)"

  14. autogen logic
    [ Manager has finished and I got it's response. Checking message. Result: ValueError fail. Last agent: QA. Next agent is... User_Proxy']

  15. User Proxy

    Executes Code

Disclosure : I haven't done any testing using OpenAI's GPT. This may or may not be a problem for GPT 3.5/4

afourney commented 11 months ago

Thanks for the awesome deep dive. Keep them coming!

We've not done a lot of testing with local models, and i actually have no idea how well we should expect a GroupChatManager to function if backed by such models. One thing to try is to leave the Chat Manager as GPT-4, but local LLMs everywhere else, and just compare performance.

Selection of the next agent is non-trivial, and frankly I'm surprised it works even with GPT-4. Here's one possible source of confusion: https://github.com/microsoft/autogen/issues/319

dogukanustun commented 11 months ago

Hello,

I am facing a similar issue. When I tried agent_by_name(), I also got ValueError but interestingly my example outputs are not like given below (taken from robzsaunders)

Example outputs from the manager:

"The role I select is Coder" "Coder:" "```Coder"

What I get for name parameter is the whole output of the LLM.

I am in desperate situation and open to every solution.

Thanks.

robzsaunders commented 11 months ago

Yea that's what I was trying to communicate with:

The major offense though which I repeatedly keep seeing is the manager respond

"Coder: >>Writes out all the code<<"

The manager does the work first but it is silent in the background and throws the role prompt through a loop

robzsaunders commented 11 months ago

This PR is related to this, offering a partial solution.

500

afourney commented 11 months ago

I think there are a few things going on here, and I think we need a piecemeal approach to solving it. The PR #500 would solve the problem if you don't need dynamic orchestration. In other words, if you already know which agents should speak, and in what order, then Group Chat -- as it currently stands -- is not a great solution, and some deterministic alternative would be better.

However, if you still want dynamic orchestration, then what we need to do is improve the GroupChatManager's performance on local models. We can do this a few ways. First, we can try improving the prompt that it uses, or perhaps use a different prompt altogether, tuned to the local model. Alternatively we can improve parsing (or recognize the failure and remind the model to output the correct format, similar to TypeChat). This would add some robustness to the selection.

I would, however, openly wonder about effectiveness. Orchestration is super complex problem, and it resembles planning. If the underlying LLM can't handle the instructions to output the correct format, I might naturally wonder how carefully considered its plans are?

robzsaunders commented 11 months ago

Yea, its why I opened this as an issue instead of knee jerking a solution with a PR.

The orchestration needs some discussion.

I'm still convinced that something isn't being sent through properly to the local model (I'm using LM Studio) since editing the managers system prompts don't do anything

SoheylM commented 11 months ago

Hi there,

I am also working on it, serving Mistral_7B with vLLM using the openai api end points.

EDIT llm_config gets passed along with via kwargs, so first problem is already addressed. The second problem seems linked to to serving Mistral 7B Instruct with vLLM. It may be related to the required prompt template.


[FIXED] The first problem I notice with respect to running a local LLM using the llm_config dictionary, and correct me if I am wrong, is its absence in the GroupChatManager constructor. Unless overwriting the DEFAULT_MODEL to be the local LLM, or pointing 'gpt-4', 'gpt-3.5-turbo' etc. model names to the local LLM, I believe two lines of code need to be added. These are to initialize the llm_config dictionary and pass it along to the Parent class, ConversableAgent:

'class GroupChatManager(ConversableAgent):

def __init__(
    self,
    groupchat: GroupChat,
    name: Optional[str] = "chat_manager",
    # unlimited consecutive auto reply by default
    max_consecutive_auto_reply: Optional[int] = sys.maxsize,
    human_input_mode: Optional[str] = "NEVER",
    system_message: Optional[str] = "Group chat manager.",
    llm_config: Optional[Union[Dict, bool]] = None,
    # seed: Optional[int] = 4,
    **kwargs,
):
    super().__init__(
        name=name,
        max_consecutive_auto_reply=max_consecutive_auto_reply,
        human_input_mode=human_input_mode,
        system_message=system_message,
        llm_config=llm_config,
        **kwargs,
    )
    self.register_reply(Agent, GroupChatManager.run_chat, config=groupchat, reset_config=GroupChat.reset)
    # self._random = random.Random(seed)'

The second problem specific to my implementation is in-line with @dogukanustun 's comment. The GroupChatManager never outputs anything related to a role. Instead it seems to answer directly the question asked even though the role mentioned is Planner, Engineer etc. The first agent that seems to answer is always the first of the list, followed by the second one and so on in order to the last. Then I swaped the positions of the agents in the groupchat. The answers provided were exactly the same, meaning they are position dependent and not agent dependent. The first, second answer etc. are always the same, the name of the agent answering appearing in terminal changes and matches the ordering of the groupchat list.

If my second problem is solved and I reach @robzsaunders problem, I think I could offer some solutions. One would be through prompt engineering to force the local LLM to spit out the correct role based on the groupchat list.

tevslin commented 8 months ago

would like to see whatever is implemented for groupchatmanager be exposed in autogen studio