microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
20.19k stars 2.94k forks source link

.Net Agents: Investigate parallel agent execution in AutoGen #6783

Open crickman opened 1 week ago

crickman commented 1 week ago

AutoGen supports some form of parallel execution. Analyze how this is supported (heirarchical chat? etc..).

We know that an Open AI Assistant API does not allow updating a thread (creating a run) in parallel. Only one active run may exist for a given thread.

crickman commented 1 week ago

Analysis:

Autogen documentation describes serveral converation patterns:
https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns

  1. A Single Agent Reply using agent.generate_reply: https://microsoft.github.io/autogen/docs/tutorial/introduction#agents

  2. Two Agent Chat using conversable_agent.initiate_chat: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns/#two-agent-chat-and-chat-result

  3. Sequential Chat using conversable_agent.initiate_chats: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns/#sequential-chats https://github.com/microsoft/autogen/blob/main/notebook/agentchats_sequential_chats.ipynb

  4. Group Chat using the GroupChat and GroupChatManager classes in conjunction with conversable_agent.initiate_chat or conversable_agent.initiate_chats: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns/#group-chat https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat.ipynb

  5. Nested Chat using conversable_agent.register_nested_chats in conjunction with agent.generate_reply: https://microsoft.github.io/autogen/docs/tutorial/conversation-patterns/#nested-chats https://github.com/microsoft/autogen/blob/main/notebook/agentchat_nestedchat.ipynb

Part of a chat is definition is a summary_method. This can be either: last_msg (default) or reflection_with_llm (an LLM produced summary). It is this summarization that is provided as carry-over between various chats. This carry-over concept is critical in propagating a chat result to a consuming chat.

A parallel chat is series of chat instances (sequential chat in the example notebook) in which each chat (or task) is may have a dependency on a previous chat (or task). This pattern is referred to as Async Chat: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_multi_task_async_chats.ipynb

Whereas a solving multiple tasks without prerequisite definition appears to default to synchronous: https://github.com/microsoft/autogen/blob/main/notebook/agentchat_multi_task_chats.ipynb

Note: The API documentation may be out-of-sync with the examples as chat_id and prerequisites are not fully addressed.

For the async example (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_multi_task_async_chats.ipynb), four tasks (chats) are defined:

  1. Lookup stock prices and performance
  2. Investigate reasons for performance
  3. Plot a graph of stock prices for the past month
  4. Produce a blog post using the information from the previous tasks.

In this case, #2 and #3 have a dependency on #1 (their prerequisites) and #4 requires all prevous tasks to be completed. This configuration allows #2 and #3 to execute in parallel asynchronously:

image

Each task maintains its own chat-history, with only the carry-over being shared betwen tasks / chats.

Note: A GroupChat is always selecting an agent-turn for synchronous execution as is any agent interaction with a single discrete chat / task.

Thoughts:

While we've focused on the GroupChat pattern in the Agent Framework, introducing a fundamental Chat primitive may provide the building block necessary to support asyncronous and nested conversational patterns. Such a primitive would also be able to support a well defined summarization concept along with other behavior modifiers (i.e,. silent, clear-history, etc...) while also providing a clear articulation point for defining user-interaction (i.e. user-proxy-agent, see: https://github.com/microsoft/semantic-kernel/issues/6758) as well as failure / retry-boundaries (https://github.com/microsoft/semantic-kernel/issues/6785).

matthewbolanos commented 1 week ago

How does the dependency graph get created? Is that something that is generated on the fly by the AI? By the developer? Or both?

crickman commented 1 week ago

In the AutoGen example its pro-code, but ultimately just a data-structure (although recipient is an agent instance)

    [
        {
            "chat_id": 1,
            "recipient": financial_assistant,
            "message": financial_tasks[0],
            "silent": False,
            "summary_method": "reflection_with_llm",
        },
        {
            "chat_id": 2,
            "prerequisites": [1],
            "recipient": research_assistant,
            "message": financial_tasks[1],
            "silent": False,
            "summary_method": "reflection_with_llm",
        },
        {
            "chat_id": 3,
            "prerequisites": [1],
            "recipient": financial_assistant,
            "message": financial_tasks[2],
            "silent": False,
            "summary_method": "reflection_with_llm",
        },
        {"chat_id": 4, "prerequisites": [1, 2, 3], "recipient": writer, "silent": False, "message": writing_tasks[0]},
    ]
tyler-suard-parker commented 1 week ago

I am interested in this feature too. It might be an idea to make the SK planner capable of writing the async agent graph, for autogen to then run.

joslat commented 4 days ago

I'd add that we need this very much, the basic chat primitive and enable at least the following constructs:

Reason: Those are super-powerful enablers to create really complex agentic flows, ideally a Nested chat should also be able to host another nested chat too. Ideally supporting parallelization/Async. And if we add some business sugar, isolated execution and some safeties on it - kind of what using a serious workflow component would do driven - so I can set the workflow to try to repeat any step in case it fails and if it goes over the max repetitions, maybe try later or have a fallback strategy.

Ideally event-driven and why not, applying the Actor pattern would suit this approach very much too, Orleans/AKKA are some favorites here :).

crickman commented 4 days ago

@tyler-suard-parker / @joslat - Thank you for this input. I'm also hopeful we will establish consensus to explore this direction soon!