Open rboen opened 1 month ago
@rboen Thanks for the ask.
We will keep track on this feature and investigate how to bring as a Speech-to-Speech
streaming abstraction to SK, for now our suggestion is while we don't have this abstraction in place to use our current APIs with the breaking glass option (Providing either the OpenAIClient
or AzureOpenAIClient
, directly) and consuming the RealtimeConversationClient
directly.
@RogerBarreto as this can be easily implemented, even the .NET c# code is a bit obfuscated (tried to improve this here: https://github.com/Azure-Samples/aoai-realtime-audio-sdk/pull/28) it would be great to use this in the context of an Agent / Assistant where I can provide tools and a nice metaprompt.
Using it as the "UserProxy Agent" basically ;) Otherwise we can just provide the initial prompt and that's all, also bind some tool here for function calling but the current way to do this is hyper-counter-intuitive :( - see here https://github.com/joslat/aoai-realtime-audio-sdk/blob/main/dotnet/samples/console-from-file/Program.cs (lines 28, 238 and 109...)
When will be this implemented? This could be gamechanger...
Low latency conversational interactions using speech is an impressive enhancement and game changer for audio chat bots. With the emergence of the gpt-40-realtime-preview in Azure OpenAI I'd love to see an integration with the Sematik Kernel in order to facilitate agents, skills / plugins in call agents scenarios.
Please have a look at https://github.com/azure-samples/aoai-realtime-audio-sdk