livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
1.03k stars 201 forks source link

TTS Steaming is Broken for livekit-plugins-elevenlabs v0.4.0 #289

Closed technoligest closed 3 months ago

technoligest commented 4 months ago

This is a more clear description + RCA of #279

I think the issue is related to the TTS streaming implementation of livekit-plugins-elevenlabs. The reason I think that is because when I comment out this code from the assistant.py code, the very first assistant.say call is sent down to the agent playground.

Debugging further, I was able to verify that OpenAI is sending down the correct response to voices, but it wasn't getting streamed as audio to the playground.

I also tried downgrading to all the 0.4.0 dev versions, and none of them fixed the issue. 0.3.0 didn't work at all with the other packages.

The product code I'm using is the official agents quickstart guide.

fjprobos commented 4 months ago

I am facing the same limitation. Calling the _synthesize_task with raw text _SpeechData object works perfect though.

Somehting else that I don't understand about this, is that _play_speech_if_validated is supposed to be validating that the streaming information is available in po_tx, however, po_tx seems to be empty when entering the _synthesize_task.

technoligest commented 4 months ago

@theomonnom i commented out the exact code you're commenting in your PR and it didn't fix it. Your change is prob good, but doesn't fix the root cause of my issue.

fjprobos commented 4 months ago

Agree, this is something different. It seems to be that po_tx is empty when we arrive at that point of the code. And that shouldn't happen as po_tx was already validated as containing data. Something is not working with the streaming on the plugin side.

gullerg commented 3 months ago

Facing same issues, trying to run the minimal voice assistant demo. I can see that audio is properly detected, transcribed, sent to the OpenAI api, but no speech is returned

fjprobos commented 3 months ago

I logged what you get in the receiver from elevenlabs after sending the first chunk (here), and I obtained the following message:

{"message":"Unusual activity detected. Free Tier usage disabled. If you are using a proxy/VPN you might need to purchase a Paid Plan to not trigger our abuse detectors. Free Tier only works if users do not abuse it, for example by creating multiple free accounts. If we notice that many people try to abuse it, we will need to reconsider Free Tier altogether. \nPlease play fair and purchase any Paid Subscription to continue.","error":"detected_unusual_activity","code":1008}

Any idea if this is related to the implementation (for example how the websockets are being opened and closed), or any other reason? I am under the trial period, and this service should work. Same API key works fine in the Rest API.

theRealMarkCastillo commented 3 months ago

I had this same error as @fjprobos and @gullerg . Seems the Free Tier has limits. I subscribed to ElevenLabs and now it works fine. Here is the error from the agent:

{"asctime": "2024-05-17 13:48:41,051", "level": "ERROR", "name": "livekit.plugins.elevenlabs", "message": "11labs connection failed\nTraceback (most recent call last):\n  File \"/Users/*****/git/livekit-test/.venv/lib/python3.11/site-packages/livekit/plugins/elevenlabs/tts.py\", line 367, in _run_ws\n    await asyncio.gather(send_task(), recv_task())\n  File \"/Users/theRealMarkCastillo/git/livekit-test/.venv/lib/python3.11/site-packages/livekit/plugins/elevenlabs/tts.py\", line 342, in recv_task\n    raise Exception(\"11labs connection closed unexpectedly\")\nException: 11labs connection closed unexpectedly\n", "job_id": "****", "pid": ****}
gullerg commented 3 months ago

okay, thanks @theRealMarkCastillo. I tried switching to the OpenAI TTS, but streaming didn't seem to be supported. Anyone had any luck using OpenAI TTS?

mchamoudadev commented 3 months ago

Is there any progress on this? I'm also facing an issue where Eleven Labs isn't sending the streaming to the client.

fjprobos commented 3 months ago

I was able to make it work after upgrading to Elevenlabs paid subscription. They don't communicate that the free trial has this limitation.

AKingSSS commented 3 months ago

{"asctime": "2024-05-22 10:48:38,712", "level": "WARNING", "name": "livekit.plugins.elevenlabs", "message": "failed to connect to 11labs, retrying in 0s", "taskName": "Task-12", "job_id": "AJ_Xz4RqLvwVwUu", "pid": 46658} Data is returned when directly calling the api interface, but an error is reported through the official website case, saying that the connection has timed out?

keepingitneil commented 3 months ago

most likely related to: https://github.com/livekit/agents/issues/305

Re: OpenAI TTS. Streaming is supported with our StreamAdapter class. Examples coming soon

gullerg commented 3 months ago

@keepingitneil great!! Look forward to the OpenAI TTS. Commented the below on another issue (closed now though).

Just pulled latest from main and tried to run the OpenAI TTS. Now, I get the following error:

{"asctime": "2024-05-22 18:15:32,198", "level": "WARNING", "name": "livekit.agents", "message": "Running <Task pending name='Task-17' coro=<VoiceAssistant._synthesize_task() running at /Users/***/dev/livekit/venv2/lib/python3.11/site-packages/livekit/agents/voice_assistant/assistant.py:737> wait_for=<Future pending cb=[Task.task_wakeup()]>> took too long: 4.99 seconds", "job_id": "AJ_25Hy5b7Cj9xj", "pid": 49646}
{"asctime": "2024-05-22 18:15:32,204", "level": "ERROR", "name": "livekit.plugins.openai", "message": "openai tts main task failed in chunked stream\nTraceback (most recent call last):\n  File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/livekit/plugins/openai/tts.py\", line 89, in _run\n    async with self._session.post(\n  File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/client.py\", line 1197, in __aenter__\n    self._resp = await self._coro\n                 ^^^^^^^^^^^^^^^^\n  File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/client.py\", line 437, in _request\n    data = payload.JsonPayload(json, dumps=self._json_serialize)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/payload.py\", line 396, in __init__\n    dumps(value).encode(encoding),\n    ^^^^^^^^^^^^\n  File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/__init__.py\", line 231, in dumps\n    return _default_encoder.encode(obj)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 200, in encode\n    chunks = self.iterencode(o, _one_shot=True)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 258, in iterencode\n    return _iterencode(o, 0)\n           ^^^^^^^^^^^^^^^^^\n  File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 180, in default\n    raise TypeError(f'Object of type {o.__class__.__name__} '\nTypeError: Object of type _LiteralGenericAlias is not JSON serializable\n", "job_id": "AJ_25Hy5b7Cj9xj", "pid": 49646}

This is my code:

import asyncio
import logging

from livekit import agents

from livekit.agents import JobContext, JobRequest, WorkerOptions, cli
from livekit.agents.llm import (
    ChatContext,
    ChatMessage,
    ChatRole,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, nltk, openai, silero

# This function is the entrypoint for the agent.
async def entrypoint(ctx: JobContext):
    # Create an initial chat context with a system prompt 
    initial_ctx = ChatContext(
        messages=[
            ChatMessage(
                role=ChatRole.SYSTEM,
                text="You are a voice assistant created by LiveKit. Your interface with users will be voice. Pretend we're having a conversation, no special formatting or headings, just natural speech.",
            )
        ]
    )

    # VoiceAssistant is a class that creates a full conversational AI agent.
    # See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
    # for details on how it works.

    openai_tts = openai.TTS(
            model=openai.TTSModels, 
            voice=openai.TTSVoices
    )
    STREAM_SENT_TOKENIZER = nltk.SentenceTokenizer(min_sentence_len=20)
    tts = agents.tts.StreamAdapter(tts=openai_tts, sentence_tokenizer=STREAM_SENT_TOKENIZER)

    assistant = VoiceAssistant(
        vad=silero.VAD(), # Voice Activity Detection
        stt=deepgram.STT(), # Speech-to-Text
        llm=openai.LLM(), # Language Model
        tts=tts,
        chat_ctx=initial_ctx, # Chat history context
    )

    # Start the voice assistant with the LiveKit room
    assistant.start(ctx.room)

    await asyncio.sleep(3)

    # Greets the user with an initial message
    await assistant.say("Hey, how can I help you today?", allow_interruptions=True)

# This function is called when the worker receives a job request
# from a LiveKit server.
async def request_fnc(req: JobRequest) -> None:
    logging.info("received request %s", req)
    # Accept the job tells the LiveKit server that this worker
    # wants the job. After the LiveKit server acknowledges that job is accepted,
    # the entrypoint function is called.
    await req.accept(entrypoint)

if __name__ == "__main__":
    # Initialize the worker with the request function
    cli.run_app(WorkerOptions(request_fnc))
theomonnom commented 3 months ago

Hey, the arguments are not valid, these must be strings

            model=openai.TTSModels, 
            voice=openai.TTSVoices
theomonnom commented 3 months ago

Hey this should now be fixed in the newer versions. It was due to 11labs changing their pricing policy