microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
22.06k stars 3.29k forks source link

.Net: New Feature: Add support for GPT-4o Real Time endpoint #9075

Open rboen opened 1 month ago

rboen commented 1 month ago

Low latency conversational interactions using speech is an impressive enhancement and game changer for audio chat bots. With the emergence of the gpt-40-realtime-preview in Azure OpenAI I'd love to see an integration with the Sematik Kernel in order to facilitate agents, skills / plugins in call agents scenarios.

Please have a look at https://github.com/azure-samples/aoai-realtime-audio-sdk

RogerBarreto commented 1 month ago

@rboen Thanks for the ask.

We will keep track on this feature and investigate how to bring as a Speech-to-Speech streaming abstraction to SK, for now our suggestion is while we don't have this abstraction in place to use our current APIs with the breaking glass option (Providing either the OpenAIClient or AzureOpenAIClient, directly) and consuming the RealtimeConversationClient directly.

joslat commented 1 month ago

@RogerBarreto as this can be easily implemented, even the .NET c# code is a bit obfuscated (tried to improve this here: https://github.com/Azure-Samples/aoai-realtime-audio-sdk/pull/28) it would be great to use this in the context of an Agent / Assistant where I can provide tools and a nice metaprompt.

Using it as the "UserProxy Agent" basically ;) Otherwise we can just provide the initial prompt and that's all, also bind some tool here for function calling but the current way to do this is hyper-counter-intuitive :( - see here https://github.com/joslat/aoai-realtime-audio-sdk/blob/main/dotnet/samples/console-from-file/Program.cs (lines 28, 238 and 109...)

jerry2007 commented 1 month ago

When will be this implemented? This could be gamechanger...