microsoft / semantic-kernel

Integrate cutting-edge LLM technology quickly and easily into your apps
https://aka.ms/semantic-kernel
MIT License
20.36k stars 2.96k forks source link

.Net: Add new AudioToText API to use services with locally deployed LLMs #6647

Open andreykaone opened 3 weeks ago

andreykaone commented 3 weeks ago

Thanks a lot to you guys for this project! It gives a very easy way to add AI functionality to already created apps.

But I've noticed that there are Azure and OpenAI connectors for audio-to-text API but no way to use locally deployed LLMs. So it would be nice to have this opportunity to 'talk to' our local models.

So basically, I propose to add a new OpenAIAudioToTextService(string modelId, Uri endpoint, ..) constructor with a Uri parameter to set which backend to use and an IKernelBuilder AddOpenAIAudioToText(this IKernelBuilder builder, string modelId, Uri endpoint, ..) extension for IKernelBuilder to configure kernel with STT service.

Creating this issue as CONTRIBUTING.md suggests discussing these changes with the team.

In fact I already have an implementation for this issue, going to pr it soon

markwallace-microsoft commented 3 weeks ago

Hi @andreykaone thanks for creating the issue and looking forward to seeing your PR. I'm going to assign this to @RogerBarreto so he can help you to get your PR reviewed.

dmytrostruk commented 3 weeks ago

So basically, I propose to add a new OpenAIAudioToTextService(string modelId, Uri endpoint, ..) constructor with a Uri parameter to set which backend to use and an IKernelBuilder AddOpenAIAudioToText(this IKernelBuilder builder, string modelId, Uri endpoint, ..) extension for IKernelBuilder to configure kernel with STT service.

@andreykaone I would propose for local models to create separate {Name}AudioToTextService which implements IAudioToTextService interface and add a functionality to access local models. The reason why I wouldn't merge it with OpenAI functionality is because local models could have some features or configuration which won't be applicable to OpenAI and vice versa, OpenAI models may have some functionality which won't be applicable to local models. If we keep functionality for local models in separate service, it will give us more flexibility to extend it in the future.