Closed technoligest closed 5 months ago
I am facing the same limitation. Calling the _synthesize_task
with raw text _SpeechData object works perfect though.
Somehting else that I don't understand about this, is that _play_speech_if_validated
is supposed to be validating that the streaming information is available in po_tx
, however, po_tx
seems to be empty when entering the _synthesize_task
.
@theomonnom i commented out the exact code you're commenting in your PR and it didn't fix it. Your change is prob good, but doesn't fix the root cause of my issue.
Agree, this is something different. It seems to be that po_tx
is empty when we arrive at that point of the code. And that shouldn't happen as po_tx
was already validated as containing data. Something is not working with the streaming on the plugin side.
Facing same issues, trying to run the minimal voice assistant demo. I can see that audio is properly detected, transcribed, sent to the OpenAI api, but no speech is returned
I logged what you get in the receiver from elevenlabs after sending the first chunk (here), and I obtained the following message:
{"message":"Unusual activity detected. Free Tier usage disabled. If you are using a proxy/VPN you might need to purchase a Paid Plan to not trigger our abuse detectors. Free Tier only works if users do not abuse it, for example by creating multiple free accounts. If we notice that many people try to abuse it, we will need to reconsider Free Tier altogether. \nPlease play fair and purchase any Paid Subscription to continue.","error":"detected_unusual_activity","code":1008}
Any idea if this is related to the implementation (for example how the websockets are being opened and closed), or any other reason? I am under the trial period, and this service should work. Same API key works fine in the Rest API.
I had this same error as @fjprobos and @gullerg . Seems the Free Tier has limits. I subscribed to ElevenLabs and now it works fine. Here is the error from the agent:
{"asctime": "2024-05-17 13:48:41,051", "level": "ERROR", "name": "livekit.plugins.elevenlabs", "message": "11labs connection failed\nTraceback (most recent call last):\n File \"/Users/*****/git/livekit-test/.venv/lib/python3.11/site-packages/livekit/plugins/elevenlabs/tts.py\", line 367, in _run_ws\n await asyncio.gather(send_task(), recv_task())\n File \"/Users/theRealMarkCastillo/git/livekit-test/.venv/lib/python3.11/site-packages/livekit/plugins/elevenlabs/tts.py\", line 342, in recv_task\n raise Exception(\"11labs connection closed unexpectedly\")\nException: 11labs connection closed unexpectedly\n", "job_id": "****", "pid": ****}
okay, thanks @theRealMarkCastillo. I tried switching to the OpenAI TTS, but streaming didn't seem to be supported. Anyone had any luck using OpenAI TTS?
Is there any progress on this? I'm also facing an issue where Eleven Labs isn't sending the streaming to the client.
I was able to make it work after upgrading to Elevenlabs paid subscription. They don't communicate that the free trial has this limitation.
{"asctime": "2024-05-22 10:48:38,712", "level": "WARNING", "name": "livekit.plugins.elevenlabs", "message": "failed to connect to 11labs, retrying in 0s", "taskName": "Task-12", "job_id": "AJ_Xz4RqLvwVwUu", "pid": 46658} Data is returned when directly calling the api interface, but an error is reported through the official website case, saying that the connection has timed out?
most likely related to: https://github.com/livekit/agents/issues/305
Re: OpenAI TTS. Streaming is supported with our StreamAdapter class. Examples coming soon
@keepingitneil great!! Look forward to the OpenAI TTS. Commented the below on another issue (closed now though).
Just pulled latest from main
and tried to run the OpenAI TTS. Now, I get the following error:
{"asctime": "2024-05-22 18:15:32,198", "level": "WARNING", "name": "livekit.agents", "message": "Running <Task pending name='Task-17' coro=<VoiceAssistant._synthesize_task() running at /Users/***/dev/livekit/venv2/lib/python3.11/site-packages/livekit/agents/voice_assistant/assistant.py:737> wait_for=<Future pending cb=[Task.task_wakeup()]>> took too long: 4.99 seconds", "job_id": "AJ_25Hy5b7Cj9xj", "pid": 49646}
{"asctime": "2024-05-22 18:15:32,204", "level": "ERROR", "name": "livekit.plugins.openai", "message": "openai tts main task failed in chunked stream\nTraceback (most recent call last):\n File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/livekit/plugins/openai/tts.py\", line 89, in _run\n async with self._session.post(\n File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/client.py\", line 1197, in __aenter__\n self._resp = await self._coro\n ^^^^^^^^^^^^^^^^\n File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/client.py\", line 437, in _request\n data = payload.JsonPayload(json, dumps=self._json_serialize)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/payload.py\", line 396, in __init__\n dumps(value).encode(encoding),\n ^^^^^^^^^^^^\n File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/__init__.py\", line 231, in dumps\n return _default_encoder.encode(obj)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 200, in encode\n chunks = self.iterencode(o, _one_shot=True)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 258, in iterencode\n return _iterencode(o, 0)\n ^^^^^^^^^^^^^^^^^\n File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 180, in default\n raise TypeError(f'Object of type {o.__class__.__name__} '\nTypeError: Object of type _LiteralGenericAlias is not JSON serializable\n", "job_id": "AJ_25Hy5b7Cj9xj", "pid": 49646}
This is my code:
import asyncio
import logging
from livekit import agents
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli
from livekit.agents.llm import (
ChatContext,
ChatMessage,
ChatRole,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, nltk, openai, silero
# This function is the entrypoint for the agent.
async def entrypoint(ctx: JobContext):
# Create an initial chat context with a system prompt
initial_ctx = ChatContext(
messages=[
ChatMessage(
role=ChatRole.SYSTEM,
text="You are a voice assistant created by LiveKit. Your interface with users will be voice. Pretend we're having a conversation, no special formatting or headings, just natural speech.",
)
]
)
# VoiceAssistant is a class that creates a full conversational AI agent.
# See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
# for details on how it works.
openai_tts = openai.TTS(
model=openai.TTSModels,
voice=openai.TTSVoices
)
STREAM_SENT_TOKENIZER = nltk.SentenceTokenizer(min_sentence_len=20)
tts = agents.tts.StreamAdapter(tts=openai_tts, sentence_tokenizer=STREAM_SENT_TOKENIZER)
assistant = VoiceAssistant(
vad=silero.VAD(), # Voice Activity Detection
stt=deepgram.STT(), # Speech-to-Text
llm=openai.LLM(), # Language Model
tts=tts,
chat_ctx=initial_ctx, # Chat history context
)
# Start the voice assistant with the LiveKit room
assistant.start(ctx.room)
await asyncio.sleep(3)
# Greets the user with an initial message
await assistant.say("Hey, how can I help you today?", allow_interruptions=True)
# This function is called when the worker receives a job request
# from a LiveKit server.
async def request_fnc(req: JobRequest) -> None:
logging.info("received request %s", req)
# Accept the job tells the LiveKit server that this worker
# wants the job. After the LiveKit server acknowledges that job is accepted,
# the entrypoint function is called.
await req.accept(entrypoint)
if __name__ == "__main__":
# Initialize the worker with the request function
cli.run_app(WorkerOptions(request_fnc))
Hey, the arguments are not valid, these must be strings
model=openai.TTSModels,
voice=openai.TTSVoices
Hey this should now be fixed in the newer versions. It was due to 11labs changing their pricing policy
Anyone knows how to change the voice for elevenlabs to a custom voice? I couldn't find in documentation :/
This is a more clear description + RCA of #279
I think the issue is related to the TTS streaming implementation of livekit-plugins-elevenlabs. The reason I think that is because when I comment out this code from the assistant.py code, the very first assistant.say call is sent down to the agent playground.
Debugging further, I was able to verify that OpenAI is sending down the correct response to voices, but it wasn't getting streamed as audio to the playground.
I also tried downgrading to all the 0.4.0 dev versions, and none of them fixed the issue. 0.3.0 didn't work at all with the other packages.
The product code I'm using is the official agents quickstart guide.