livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
1.03k stars 201 forks source link

openai.tts with StreamAdapter has some bugs #298

Closed TONY-STARK-TECH closed 3 months ago

TONY-STARK-TECH commented 3 months ago

The experience with this SDK is a bit poor

1、elevenlabs tts.py line:333, if meet api error, please take a log. @MichaelYang1995 china area can't visit elevenlabs, with VPN only use paid elevenlabs API. image be a paid api user to try.

2、openai.tts with StreamAdapter has some bugs, if you follow agent quick-start doc, and use follow code:

openai_tts = openai.TTS(
            model=openai.TTSModels, 
            voice=openai.TTSVoices)
    vad = silero.VAD()
    vad_stream = vad.stream(min_silence_duration=1.0)
    tts = agents.tts.StreamAdapter(openai_tts, vad_stream)

you will got some error like VADStream has no attribute 'stream', from file: livekit/agents/voice_assistant/assistant.py:728 @keepingitneil it's a code bug right? i want to use openai.tts-1 not elevenlabs, how to fix it ?

MichaelYang1995 commented 3 months ago

Yeah,I meet same question too. wait for help @davidzhao

gullerg commented 3 months ago

Running the voice assistant demo with openai.TTs give me this error:

{"asctime": "2024-05-19 16:37:28,818", "level": "ERROR", "name": "livekit.agents", "message": "unhandled exception in the job entry <function entrypoint at 0x104c48a40>\nTraceback (most recent call last):\n File \"/Users/***/dev/livekit/main.py\", line 36, in entrypoint\n tts = agents.tts.StreamAdapter(openai_tts, vad_stream)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/***/dev/livekit/venv/lib/python3.11/site-packages/livekit/agents/tts/stream_adapter.py\", line 76, in __init__\n super().__init__(streaming_supported=True)\nTypeError: TTS.__init__() missing 2 required keyword-only arguments: 'sample_rate' and 'num_channels'\n", "job_id": "AJ_KVYx8innpDEM", "pid": 21176}

Any idea why?

This is my code:

import asyncio
import logging

from livekit import agents

from livekit.agents import JobContext, JobRequest, WorkerOptions, cli
from livekit.agents.llm import (
    ChatContext,
    ChatMessage,
    ChatRole,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, openai, silero

# This function is the entrypoint for the agent.
async def entrypoint(ctx: JobContext):
    # Create an initial chat context with a system prompt 
    initial_ctx = ChatContext(
        messages=[
            ChatMessage(
                role=ChatRole.SYSTEM,
                text="You are a voice assistant created by LiveKit. Your interface with users will be voice. Pretend we're having a conversation, no special formatting or headings, just natural speech.",
            )
        ]
    )

    # VoiceAssistant is a class that creates a full conversational AI agent.
    # See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
    # for details on how it works.

    openai_tts = openai.TTS(
            model=openai.TTSModels, 
            voice=openai.TTSVoices)
    vad = silero.VAD()
    vad_stream = vad.stream(min_silence_duration=1.0)
    tts = agents.tts.StreamAdapter(openai_tts, vad_stream)

    assistant = VoiceAssistant(
        vad=silero.VAD(), # Voice Activity Detection
        stt=deepgram.STT(), # Speech-to-Text
        llm=openai.LLM(), # Language Model
        tts=tts,
        chat_ctx=initial_ctx, # Chat history context
    )

    # Start the voice assistant with the LiveKit room
    assistant.start(ctx.room)

    await asyncio.sleep(3)

    # Greets the user with an initial message
    await assistant.say("Hey, how can I help you today?", allow_interruptions=True)

# This function is called when the worker receives a job request
# from a LiveKit server.
async def request_fnc(req: JobRequest) -> None:
    logging.info("received request %s", req)
    # Accept the job tells the LiveKit server that this worker
    # wants the job. After the LiveKit server acknowledges that job is accepted,
    # the entrypoint function is called.
    await req.accept(entrypoint)

if __name__ == "__main__":
    # Initialize the worker with the request function
    cli.run_app(WorkerOptions(request_fnc))
TONY-STARK-TECH commented 3 months ago

@gullerg code bug in init method in stream_adapter, you need modified yourself. add 'sample_rate' and 'num_channels to this init method.

image

but you don't need do this. Even if you change the code, there is still a problem likeVADStream has no attribute 'stream .

code bug here.

MichaelYang1995 commented 3 months ago

The experience with this SDK is a bit poor 1、elevenlabs tts.py line:333, if meet api error, please take a log. @MichaelYang1995 china area can't visit elevenlabs, with VPN only use paid elevenlabs API. image be a paid api user to try. 2、openai.tts with StreamAdapter has some bugs, if you follow agent quick-start doc, and use follow code:

openai_tts = openai.TTS(
            model=openai.TTSModels, 
            voice=openai.TTSVoices)
    vad = silero.VAD()
    vad_stream = vad.stream(min_silence_duration=1.0)
    tts = agents.tts.StreamAdapter(openai_tts, vad_stream)

you will got some error like VADStream has no attribute 'stream', from file: livekit/agents/voice_assistant/assistant.py:728 @keepingitneil it's a code bug right? i want to use openai.tts-1 not elevenlabs, how to fix it ?

If I use the "paid elevenlabs API", can the previous issue inside "Agent start questions." be resolved?

and the image is broken,Could you upload it one more time, please?

TONY-STARK-TECH commented 3 months ago
  1. images shows i have a paid elevenlabs subscribe.

image

  1. no, i can't promise you can , but have a try, only cost $1. it work after paid and follow the quick-start code in docs.
  2. now, my problem was openai.tts with stream not work.
MichaelYang1995 commented 3 months ago

@StarkDylan 你是中国人吗?能加一个微信吗?大佬

TONY-STARK-TECH commented 3 months ago

I am an Australian Chinese, do not use WeChat

MichaelYang1995 commented 3 months ago

I understand it. Thank you very much!

MichaelYang1995 commented 3 months ago

I am an Australian Chinese, do not use WeChat

@JARVISMindEngineer big bro, could you help look into this issue? I upgraded to the paid api of ElevenLabs, but I still can't converse with the agent.

https://github.com/livekit/agents/issues/303

Maybe my python version is too low? my python version is 3.11

TONY-STARK-TECH commented 3 months ago

@MichaelYang1995 Add proxy to deepgram request, have a try. i comment u in your issue.

theomonnom commented 3 months ago

Fixed in #299

gullerg commented 3 months ago

Just pulled latest from main and tried to run the OpenAI TTS. Now, I get the following error:

{"asctime": "2024-05-22 18:15:32,198", "level": "WARNING", "name": "livekit.agents", "message": "Running <Task pending name='Task-17' coro=<VoiceAssistant._synthesize_task() running at /Users/***/dev/livekit/venv2/lib/python3.11/site-packages/livekit/agents/voice_assistant/assistant.py:737> wait_for=<Future pending cb=[Task.task_wakeup()]>> took too long: 4.99 seconds", "job_id": "AJ_25Hy5b7Cj9xj", "pid": 49646}
{"asctime": "2024-05-22 18:15:32,204", "level": "ERROR", "name": "livekit.plugins.openai", "message": "openai tts main task failed in chunked stream\nTraceback (most recent call last):\n  File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/livekit/plugins/openai/tts.py\", line 89, in _run\n    async with self._session.post(\n  File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/client.py\", line 1197, in __aenter__\n    self._resp = await self._coro\n                 ^^^^^^^^^^^^^^^^\n  File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/client.py\", line 437, in _request\n    data = payload.JsonPayload(json, dumps=self._json_serialize)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/payload.py\", line 396, in __init__\n    dumps(value).encode(encoding),\n    ^^^^^^^^^^^^\n  File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/__init__.py\", line 231, in dumps\n    return _default_encoder.encode(obj)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 200, in encode\n    chunks = self.iterencode(o, _one_shot=True)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 258, in iterencode\n    return _iterencode(o, 0)\n           ^^^^^^^^^^^^^^^^^\n  File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 180, in default\n    raise TypeError(f'Object of type {o.__class__.__name__} '\nTypeError: Object of type _LiteralGenericAlias is not JSON serializable\n", "job_id": "AJ_25Hy5b7Cj9xj", "pid": 49646}

This is my code:

import asyncio
import logging

from livekit import agents

from livekit.agents import JobContext, JobRequest, WorkerOptions, cli
from livekit.agents.llm import (
    ChatContext,
    ChatMessage,
    ChatRole,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, nltk, openai, silero

# This function is the entrypoint for the agent.
async def entrypoint(ctx: JobContext):
    # Create an initial chat context with a system prompt 
    initial_ctx = ChatContext(
        messages=[
            ChatMessage(
                role=ChatRole.SYSTEM,
                text="You are a voice assistant created by LiveKit. Your interface with users will be voice. Pretend we're having a conversation, no special formatting or headings, just natural speech.",
            )
        ]
    )

    # VoiceAssistant is a class that creates a full conversational AI agent.
    # See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
    # for details on how it works.

    openai_tts = openai.TTS(
            model=openai.TTSModels, 
            voice=openai.TTSVoices
    )
    STREAM_SENT_TOKENIZER = nltk.SentenceTokenizer(min_sentence_len=20)
    tts = agents.tts.StreamAdapter(tts=openai_tts, sentence_tokenizer=STREAM_SENT_TOKENIZER)

    assistant = VoiceAssistant(
        vad=silero.VAD(), # Voice Activity Detection
        stt=deepgram.STT(), # Speech-to-Text
        llm=openai.LLM(), # Language Model
        tts=tts,
        chat_ctx=initial_ctx, # Chat history context
    )

    # Start the voice assistant with the LiveKit room
    assistant.start(ctx.room)

    await asyncio.sleep(3)

    # Greets the user with an initial message
    await assistant.say("Hey, how can I help you today?", allow_interruptions=True)

# This function is called when the worker receives a job request
# from a LiveKit server.
async def request_fnc(req: JobRequest) -> None:
    logging.info("received request %s", req)
    # Accept the job tells the LiveKit server that this worker
    # wants the job. After the LiveKit server acknowledges that job is accepted,
    # the entrypoint function is called.
    await req.accept(entrypoint)

if __name__ == "__main__":
    # Initialize the worker with the request function
    cli.run_app(WorkerOptions(request_fnc))