Closed TONY-STARK-TECH closed 3 months ago
Yeah,I meet same question too. wait for help @davidzhao
Running the voice assistant demo with openai.TTs give me this error:
{"asctime": "2024-05-19 16:37:28,818", "level": "ERROR", "name": "livekit.agents", "message": "unhandled exception in the job entry <function entrypoint at 0x104c48a40>\nTraceback (most recent call last):\n File \"/Users/***/dev/livekit/main.py\", line 36, in entrypoint\n tts = agents.tts.StreamAdapter(openai_tts, vad_stream)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/***/dev/livekit/venv/lib/python3.11/site-packages/livekit/agents/tts/stream_adapter.py\", line 76, in __init__\n super().__init__(streaming_supported=True)\nTypeError: TTS.__init__() missing 2 required keyword-only arguments: 'sample_rate' and 'num_channels'\n", "job_id": "AJ_KVYx8innpDEM", "pid": 21176}
Any idea why?
This is my code:
import asyncio
import logging
from livekit import agents
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli
from livekit.agents.llm import (
ChatContext,
ChatMessage,
ChatRole,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, openai, silero
# This function is the entrypoint for the agent.
async def entrypoint(ctx: JobContext):
# Create an initial chat context with a system prompt
initial_ctx = ChatContext(
messages=[
ChatMessage(
role=ChatRole.SYSTEM,
text="You are a voice assistant created by LiveKit. Your interface with users will be voice. Pretend we're having a conversation, no special formatting or headings, just natural speech.",
)
]
)
# VoiceAssistant is a class that creates a full conversational AI agent.
# See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
# for details on how it works.
openai_tts = openai.TTS(
model=openai.TTSModels,
voice=openai.TTSVoices)
vad = silero.VAD()
vad_stream = vad.stream(min_silence_duration=1.0)
tts = agents.tts.StreamAdapter(openai_tts, vad_stream)
assistant = VoiceAssistant(
vad=silero.VAD(), # Voice Activity Detection
stt=deepgram.STT(), # Speech-to-Text
llm=openai.LLM(), # Language Model
tts=tts,
chat_ctx=initial_ctx, # Chat history context
)
# Start the voice assistant with the LiveKit room
assistant.start(ctx.room)
await asyncio.sleep(3)
# Greets the user with an initial message
await assistant.say("Hey, how can I help you today?", allow_interruptions=True)
# This function is called when the worker receives a job request
# from a LiveKit server.
async def request_fnc(req: JobRequest) -> None:
logging.info("received request %s", req)
# Accept the job tells the LiveKit server that this worker
# wants the job. After the LiveKit server acknowledges that job is accepted,
# the entrypoint function is called.
await req.accept(entrypoint)
if __name__ == "__main__":
# Initialize the worker with the request function
cli.run_app(WorkerOptions(request_fnc))
@gullerg code bug in init method in stream_adapter, you need modified yourself. add 'sample_rate' and 'num_channels to this init method.
but you don't need do this. Even if you change the code, there is still a problem likeVADStream has no attribute 'stream
.
code bug here.
The experience with this SDK is a bit poor
1、elevenlabs tts.py line:333, if meet api error, please take a log. @MichaelYang1995 china area can't visit elevenlabs, with VPN only use paid elevenlabs API. be a paid api user to try. 2、openai.tts with StreamAdapter has some bugs, if you follow agent quick-start doc, and use follow code:openai_tts = openai.TTS( model=openai.TTSModels, voice=openai.TTSVoices) vad = silero.VAD() vad_stream = vad.stream(min_silence_duration=1.0) tts = agents.tts.StreamAdapter(openai_tts, vad_stream)
you will got some error like
VADStream has no attribute 'stream'
, from file: livekit/agents/voice_assistant/assistant.py:728 @keepingitneil it's a code bug right? i want to use openai.tts-1 not elevenlabs, how to fix it ?
If I use the "paid elevenlabs API", can the previous issue inside "Agent start questions." be resolved?
and the image is broken,Could you upload it one more time, please?
@StarkDylan 你是中国人吗?能加一个微信吗?大佬
I am an Australian Chinese, do not use WeChat
I understand it. Thank you very much!
I am an Australian Chinese, do not use WeChat
@JARVISMindEngineer big bro, could you help look into this issue? I upgraded to the paid api of ElevenLabs, but I still can't converse with the agent.
https://github.com/livekit/agents/issues/303
Maybe my python version is too low? my python version is 3.11
@MichaelYang1995 Add proxy to deepgram request, have a try. i comment u in your issue.
Fixed in #299
Just pulled latest from main
and tried to run the OpenAI TTS. Now, I get the following error:
{"asctime": "2024-05-22 18:15:32,198", "level": "WARNING", "name": "livekit.agents", "message": "Running <Task pending name='Task-17' coro=<VoiceAssistant._synthesize_task() running at /Users/***/dev/livekit/venv2/lib/python3.11/site-packages/livekit/agents/voice_assistant/assistant.py:737> wait_for=<Future pending cb=[Task.task_wakeup()]>> took too long: 4.99 seconds", "job_id": "AJ_25Hy5b7Cj9xj", "pid": 49646}
{"asctime": "2024-05-22 18:15:32,204", "level": "ERROR", "name": "livekit.plugins.openai", "message": "openai tts main task failed in chunked stream\nTraceback (most recent call last):\n File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/livekit/plugins/openai/tts.py\", line 89, in _run\n async with self._session.post(\n File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/client.py\", line 1197, in __aenter__\n self._resp = await self._coro\n ^^^^^^^^^^^^^^^^\n File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/client.py\", line 437, in _request\n data = payload.JsonPayload(json, dumps=self._json_serialize)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/***/dev/livekit/venv2/lib/python3.11/site-packages/aiohttp/payload.py\", line 396, in __init__\n dumps(value).encode(encoding),\n ^^^^^^^^^^^^\n File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/__init__.py\", line 231, in dumps\n return _default_encoder.encode(obj)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 200, in encode\n chunks = self.iterencode(o, _one_shot=True)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 258, in iterencode\n return _iterencode(o, 0)\n ^^^^^^^^^^^^^^^^^\n File \"/Users/***/.pyenv/versions/3.11.1/lib/python3.11/json/encoder.py\", line 180, in default\n raise TypeError(f'Object of type {o.__class__.__name__} '\nTypeError: Object of type _LiteralGenericAlias is not JSON serializable\n", "job_id": "AJ_25Hy5b7Cj9xj", "pid": 49646}
This is my code:
import asyncio
import logging
from livekit import agents
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli
from livekit.agents.llm import (
ChatContext,
ChatMessage,
ChatRole,
)
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, nltk, openai, silero
# This function is the entrypoint for the agent.
async def entrypoint(ctx: JobContext):
# Create an initial chat context with a system prompt
initial_ctx = ChatContext(
messages=[
ChatMessage(
role=ChatRole.SYSTEM,
text="You are a voice assistant created by LiveKit. Your interface with users will be voice. Pretend we're having a conversation, no special formatting or headings, just natural speech.",
)
]
)
# VoiceAssistant is a class that creates a full conversational AI agent.
# See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
# for details on how it works.
openai_tts = openai.TTS(
model=openai.TTSModels,
voice=openai.TTSVoices
)
STREAM_SENT_TOKENIZER = nltk.SentenceTokenizer(min_sentence_len=20)
tts = agents.tts.StreamAdapter(tts=openai_tts, sentence_tokenizer=STREAM_SENT_TOKENIZER)
assistant = VoiceAssistant(
vad=silero.VAD(), # Voice Activity Detection
stt=deepgram.STT(), # Speech-to-Text
llm=openai.LLM(), # Language Model
tts=tts,
chat_ctx=initial_ctx, # Chat history context
)
# Start the voice assistant with the LiveKit room
assistant.start(ctx.room)
await asyncio.sleep(3)
# Greets the user with an initial message
await assistant.say("Hey, how can I help you today?", allow_interruptions=True)
# This function is called when the worker receives a job request
# from a LiveKit server.
async def request_fnc(req: JobRequest) -> None:
logging.info("received request %s", req)
# Accept the job tells the LiveKit server that this worker
# wants the job. After the LiveKit server acknowledges that job is accepted,
# the entrypoint function is called.
await req.accept(entrypoint)
if __name__ == "__main__":
# Initialize the worker with the request function
cli.run_app(WorkerOptions(request_fnc))