.Net Agents: Should InvokeAsync for ChatCompletionAgent return an IAsyncEnumerable?

microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps

https://aka.ms/semantic-kernel

MIT License

20.44k stars 2.96k forks source link

.Net Agents: Should InvokeAsync for ChatCompletionAgent return an IAsyncEnumerable? #6808

Open matthewbolanos opened 2 weeks ago

matthewbolanos commented 2 weeks ago

Once we have the streaming version of agents, shouldn't ChatCompletionAgent just return a list of chat messages?

crickman commented 2 weeks ago

This decision point might not be necessarily linked to streaming support.

RogerBarreto commented 2 weeks ago

Curious to know if the streaming version of agents provide a new API call for streaming or the whole agent is stream. This question relates to how we do in SK services where we do provide two APIs for Streaming (IAsyncEnumerable) and Non-Streaming scenarios in the same service.

graph LR
Agent --> InvokeAsync
Agent --> InvokeStreamingAsync

graph LR
Agent --> InvokeAsync_
StreamingAgent --> InvokeAsync

crickman commented 2 weeks ago

@RogerBarreto - Likely the former. Do you have thoughts / suggestions?

crickman commented 2 weeks ago

@matthewbolanos - Signature modification also indicated for https://github.com/microsoft/semantic-kernel/issues/6813.

crickman commented 2 weeks ago

@matthewbolanos - For the Open AI Assistant case, a run can produce multiple messages which are sometimes separated by tool calls that might have latency (i.e. code-interpreter). IAsyncEnumerable enables earlier messages to be accessed while the run proceeds. Returning the messages in a syncronous fashion may increase the percieved latency.

This consideration is likely also applicable to chat-completion scenarios but would require reconsideration of IChatCompletionService