Closed fabio-sp closed 8 months ago
Similar to this issue https://github.com/microsoft/semantic-kernel/issues/4627
This is likely this bug in the Azure SDK: https://github.com/Azure/azure-sdk-for-net/issues/41838
It was fixed in https://github.com/Azure/azure-sdk-for-net/pull/41844 but a new build with the fix hasn't been published to nuget yet.
In the meantime, try adding await Task.Yield()
in your foreach loop and see if that improves the streaming.
@fabio-sp based on the response from Stephen looks like this isn't an Semantic Kernel issue so I'm going to close this.
I can confirm that the workaround suggested by @stephentoub works fine. Thank you!
Describe the bug In an ASP .NET Core controller I want to stream back to the client the tokens response of a prompt issued to OpenAI. Despite yield returning the single response tokens, the response seems to be buffered server side untill all the response stream from OpenAI is consumed and only at the end is returned to the client.
To Reproduce Here is a simple endpoint implementation to reproduce the problem:
The full example can be found here: https://github.com/fabio-sp/sk-streaming-sample-webapi
Expected behavior The response is correctly streamed to the client from the controller without waiting the whole LLM response to be completed before returning.
Platform
Additional context The problem seems to affect both AzureOpenAI and OpenAI connectors, I could not test it with the other connectors as I've no access to the other platforms. The issue also occurs when using the method
InvokePromptStreamingAsync
directly on the Kernel instance, or when using theIChatCompletionService
and theGetStreamingChatMessageContentsAsync
method. All the different tests I made can be found in the repository linked above.We were using the SK version 1.0.0-beta-3 and with the
ITextCompletion
we had no such problem, a lot of things have been changed since then.