microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
21.9k stars 3.26k forks source link

.Net: New Feature: Allow use of an existing OpenAI Assistants thread when utilizing `AgentGroupChat` #8716

Open joeyj opened 1 month ago

joeyj commented 1 month ago

Currently, using AgentGroupChat with OpenAIAssistantAgent causes a new thread to be created and deleted for each chat. Since threads are an externally managed resource, is there a design where those can be managed outside of the chat? e.g., long-lived threads

crickman commented 1 month ago

@joeyj -Thanks for this suggestion. I'd like to explore what this might look like.

I struggle a bit in reconciling this requirement with existing behaviors / use-cases:

  1. Any instance of an agent is allowed to participate in more than one conversation or chat.
  2. AgentChat / AgentGroupChat doesn't have specific knowledge of any particular agent type.
  3. Different assistant agents may certainly target different thread-ids for the same conversation. For example, nothing stops a user from targeting models in different Azure regions / different endpoints...or even mixing an Azure endpoint with an OpenAI endpoint. The framework does not constrain this.

Introducing ThreadId property to an assistant-agent breaks #1. Based on #2, AgentChat doesn't know what a thread-is...how to express this? And #3, how many thread-ids and for which agents?

We are working on a serialization feature that will allow the group-chat to be deserialized using the the same threads. This would allow a long-lasting conversation. It wouldn't support joining a new AgentChat to existing threads, however.

Thoughts?

joeyj commented 1 month ago

@crickman Thanks for the detailed reply.

It might help to give a bit more context on how we are using Semantic Kernel. Currently, we are running it in a stateless distributed service to support scenarios like SMS webhooks to communicate with our customers. We end up loading and persisting ChatHistory on every call. OpenAI Assistants look very attractive in solving this for us as it lets us remotely store our messages for their entire lifetime without having to load/persist all of them every time.

We could instead use OpenAIAssistantAgent directly and manage the threads ourselves but we'd miss out on utilizing AgentGroupChat for multi-agent scenarios when we operate in this stateless fashion. In theory, we could use the same thread for multiple agents directly in OpenAI Assistants which would remove the need for locally caching them in memory in Semantic Kernel.

Hopefully that context helps a bit.

Some thoughts on how this could be supported in Semantic Kernel: