Open matthewbolanos opened 2 weeks ago
This decision point might not be necessarily linked to streaming support.
Curious to know if the streaming version of agents provide a new API call for streaming or the whole agent is stream. This question relates to how we do in SK services where we do provide two APIs for Streaming (IAsyncEnumerable
) and Non-Streaming scenarios in the same service.
graph LR
Agent --> InvokeAsync
Agent --> InvokeStreamingAsync
OR
graph LR
Agent --> InvokeAsync_
StreamingAgent --> InvokeAsync
@RogerBarreto - Likely the former. Do you have thoughts / suggestions?
@matthewbolanos - Signature modification also indicated for https://github.com/microsoft/semantic-kernel/issues/6813.
@matthewbolanos - For the Open AI Assistant case, a run can produce multiple messages which are sometimes separated by tool calls that might have latency (i.e. code-interpreter). IAsyncEnumerable
enables earlier messages to be accessed while the run proceeds. Returning the messages in a syncronous fashion may increase the percieved latency.
This consideration is likely also applicable to chat-completion scenarios but would require reconsideration of IChatCompletionService
Once we have the streaming version of agents, shouldn't
ChatCompletionAgent
just return a list of chat messages?