livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
1.99k stars 257 forks source link

Elevenlabs TTS error : TypeError: memoryview: length is not a multiple of itemsize #366

Closed ksingh-scogo closed 2 months ago

ksingh-scogo commented 4 months ago

Setup

Frontend : https://agents-playground.livekit.io/ with Livkit Cloud Connected to a project in LiveKit Cloud account Livkit Server : via LiveKit Cloud Agent main.py file below

import asyncio
import logging

from livekit.agents import JobContext, JobRequest, WorkerOptions, cli, tokenize, tts
from livekit.agents.llm import (
    ChatContext,
    ChatMessage,
    ChatRole,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, openai, silero, elevenlabs

VOICE = elevenlabs.Voice(
    id="ssssxxxxxxx",
    name="xxxxxxx",
    category="professional"
)

async def entrypoint(ctx: JobContext):
    # Create an initial chat context with a system prompt 
    initial_ctx = ChatContext(
        messages=[
            ChatMessage(
                role=ChatRole.SYSTEM,
                text='''You are a voice assistant created by LiveKit. Your interface with users will be voice. You should use short and concise responses, and avoiding usage of unpronouncable punctuation. 
                ''',
            )
        ]
    )

#    openai_tts = tts.StreamAdapter(
#        tts=openai.TTS(voice="nova", model="tts-1"),
#        sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
#    )

    assistant = VoiceAssistant(
        vad=silero.VAD(),
        stt=deepgram.STT(),
        llm=openai.LLM(),
        tts=elevenlabs.TTS(model_id='eleven_multilingual_v2', voice=VOICE), # Text-to-Speech
        # tts=openai_tts,
        chat_ctx=initial_ctx,
    )
    assistant.start(ctx.room)

    await asyncio.sleep(1)
    await assistant.say("Hi, welcome to ACME Support. I am Bro, your AI Assistant. How can I assist you today?", allow_interruptions=True)

async def request_fnc(req: JobRequest) -> None:
    logging.info("received request %s", req)
    await req.accept(entrypoint)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(request_fnc))

Error

2024-06-08 22:12:45,766 ERROR  livekit.agents  Error in _play_speech_if_validated_task
Traceback (most recent call last):
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/agents/voice_assistant/assistant.py", line 694, in _play_speech_if_validated_task
    await self._playout_co(po_rx, tts_forwarder)
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/agents/voice_assistant/assistant.py", line 863, in _playout_co
    while i < len(buf.data):
                  ^^^^^^^^
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/rtc/audio_frame.py", line 90, in data
    return memoryview(self._data).cast("h")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: memoryview: length is not a multiple of itemsize
      job_id=AJ_Gcx5DzAC9jdK pid=32025
2024-06-08 22:12:45,767 ERROR  livekit.agents  unhandled exception in the job entry <function entrypoint at 0x7cb47885b560>
Traceback (most recent call last):
  File "/mnt/sdd/naman/voice_agent_backend/main2.py", line 61, in entrypoint
    await assistant.say("Hi, welcome to Scogo Technical Support. I am Sia, your AI Assistant. How can I assist you today?", allow_interruptions=True)
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/agents/voice_assistant/assistant.py", line 243, in say
    await self._play_atask
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/agents/voice_assistant/assistant.py", line 694, in _play_speech_if_validated_task
    await self._playout_co(po_rx, tts_forwarder)
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/agents/voice_assistant/assistant.py", line 863, in _playout_co
    while i < len(buf.data):
                  ^^^^^^^^
  File "/mnt/sdd/naman/miniconda3/envs/voice-support-3/lib/python3.12/site-packages/livekit/rtc/audio_frame.py", line 90, in data
    return memoryview(self._data).cast("h")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: memoryview: length is not a multiple of itemsize
      job_id=AJ_Gcx5DzAC9jdK pid=32025
ksingh-scogo commented 4 months ago

When i am running the above with OpenAI TTS code, it does not throws the above error (at least) and works when a single user simulates the conversation with AI Agent.

However with that if multi user starts their separate conversation the OpenAI TTS also breaks, for this i have created another issue #365

theomonnom commented 4 months ago

Hey, thanks for the report, it seems like 11labs is returning odd frame sizes (or maybe the mp3 decoder). I'll investigate asap!

davidzhao commented 4 months ago

what do you mean if multi user starts their separate conversation ? can you provide an example?

ksingh-scogo commented 3 months ago

what do you mean if multi user starts their separate conversation ? can you provide an example?

So a single user accessing the frontend https://agents-playground.livekit.io/ works fine .

The next test is if i make this link public and have several user go to the frontend link and initiate conversation.

In my test User-1 : MacOS Chrome User-2 : Iphone Safari User-3 : Android Chrome

This is what i mean by if multi user starts their separate conversation and its breaking atm

Happy to work with you in testing this and fixing it.

Love Livekit BTW

ksingh-scogo commented 3 months ago

Hey, thanks for the report, it seems like 11labs is returning odd frame sizes (or maybe the mp3 decoder). I'll investigate asap!

@theomonnom it believe it too, do you want me to try some parameter that you believe could be worth trying ? Happy to help and troubleshoot this with you

ZanSara commented 3 months ago

@theomonnom I'm having the same issue (TypeError: memoryview: length is not a multiple of itemsize) , did you find a solution?

ksingh-scogo commented 3 months ago

@ZanSara not very sure if i found a clean solution , however it started to work as i added more dollars to Elevenlabs account. So check if your Elevenlabs account have enough balance.

keepingitneil commented 2 months ago

I believe this is an issue when elevenlabs returns a non-audio body. For example, when you have reached a limit on your price tier. Closing this issue and adding a ticket on our end to expose better logs for this case.