livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
4k stars 415 forks source link

Azure config param throws error. #942

Closed yakhyo closed 1 month ago

yakhyo commented 1 month ago

Hi, thank you for this nice tool.

I was using Azure TTS and set some custom config but getting error from the lib packages. I was curious how to tackle with this issue?

my current code snippet:

from livekit.plugins.azure.tts import ProsodyConfig

config = ProsodyConfig(rate="fast") # this is the config file we can set for Azure 

async def entrypoint(ctx: JobContext):
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=prompt,
    )

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # Wait for the first participant to connect
    participant = await ctx.wait_for_participant()

    assistant = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=deepgram.STT(language="ko"), 
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=azure.TTS(
            voice="ko-KR-JiMinNeural",
            language="ko-KR",
            prosody=config
        ),
        chat_ctx=initial_ctx,
    )

++++++++++++++++++++ below is Azure TTS class and we can pass the config here:+++++++++++++

class TTS(tts.TTS):
    def __init__(
        self,
        *,
        speech_key: str | None = None,
        speech_region: str | None = None,
        voice: str | None = None,
        endpoint_id: str | None = None,
        language: str | None = None,
        prosody: ProsodyConfig | None = None,
    ) -> None:

+++++++++++++++++ when I do so there is a ValueError ++++++++++++++++++++

raise ValueError(
ValueError: failed to synthesize audio: ResultReason.Canceled: CancellationReason.Error (Connection was closed by the remote host. Error code: 1007. Error details: Ssml should contain at least one [VOICE] tag. USP state: TurnStarted. Received audio size: 0 bytes.)

would appreciate any help for this.

harmlessman commented 1 month ago

This is a problem because there is no voice tag in ssml of azure tts.

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py

This is resolved by changing the _synthsize function of 180 lines.

def _synthesize() -> speechsdk.SpeechSynthesisResult:
    if self._opts.prosody:
        ssml = f'<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="{self._opts.language or "en-US"}">'
        voice_ssml = f'<voice name="{self._opts.voice}">'
        prosody_ssml = "<prosody"

        if self._opts.prosody.rate:
            prosody_ssml += f' rate="{self._opts.prosody.rate}"'
        if self._opts.prosody.volume:
            prosody_ssml += f' volume="{self._opts.prosody.volume}"'
        if self._opts.prosody.pitch:
            prosody_ssml += f' pitch="{self._opts.prosody.pitch}"'
        prosody_ssml += ">"
        ssml += voice_ssml
        ssml += prosody_ssml
        ssml += self._text
        ssml += "</prosody></voice></speak>"
        return synthesizer.speak_ssml_async(ssml).get()  # type: ignore

    return synthesizer.speak_text_async(self._text).get()  # type: ignore
theomonnom commented 1 month ago

Hey, seems like this was fixed inside this PR https://github.com/livekit/agents/pull/929

yakhyo commented 3 weeks ago

Unfortunately, it did not solve the problem. I still cannot pass config file. what might be I am doing wrong in here? thank you

harmlessman commented 3 weeks ago

https://github.com/livekit/agents/pull/929 Now that the change has been merged, why don't you get the project source code again and run it?

yakhyo commented 3 weeks ago

thank you @harmlessman