livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
4.08k stars 433 forks source link

AzureOpenAI TTS and STT not available? #789

Open jeroenverboomPNL opened 2 months ago

jeroenverboomPNL commented 2 months ago

Hi!

Is it correct that the livekit-azure-plugin only works with the Azure native STT and TSS services and that AzureOpenAI's STT and TSS are note in the livekit SDK yet?

I do see the functionality for the LLM, curious to hear if this is currently being developed or that I should build my own plugin.

Warm regards,

Jeroen

davidzhao commented 2 months ago

Azure's OpenAI STT/TTS should work with the openai.stt/tts packages since they are API compatible with OpenAI, though it'd be great to confirm that.

Are you interested in trying it? if they do work, it'd be great to have a similar with_azure wrapper as well.

jeroenverboomPNL commented 2 months ago

Hi @davidzhao,

Thanks for the prompt response!

I am still setting up my AzureOpenAI instance, I will try to check it out in the coming weeks.

Here is my idea:

  1. initialise client

    azure_client  = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2024-02-01",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )
  2. Pass Azure details to the openai.STT class like below in the

    assistant = VoiceAssistant(
    vad=silero.VAD.load(),
    stt=openai.STT(
    language: str = "en",
        detect_language: bool = False,
        model: WhisperModels = <AZURE_STT_MODEL_DEPLOYMENT_NAME>
        base_url = <AZURE_OPENAI_ENDPOINT_URL>
        api_key = <AZURE_OPENAI_API_KEY>
        client = azure_client 
    ...
    ...
    ...

If that does not work, I'll try to make a similar wrapper like in the openai.LLM class