livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
3.99k stars 410 forks source link

AssistantLLM with VoicePipelineAgent not working properly #1086

Open xavier-pare-ai opened 2 days ago

xavier-pare-ai commented 2 days ago

Hi,

I've tried creating an agent using an openAI Assistant as the LLM. It joins the room and works as expected until after the it's first utterance. After speaking the string I pass into the agent.say() function and then I respond it will either stay silent or speak sentences with words out of order, sounding like gibberish. When I check the logs I get two errors:

2024-11-13 04:03:17,675 - ERROR livekit.agents.pipeline - Error in _stream_synthesis_task Traceback (most recent call last): File "/home/appuser/.local/lib/python3.11/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/appuser/.local/lib/python3.11/site-packages/livekit/agents/pipeline/agent_output.py", line 273, in _stream_synthesis_task async for seg in tts_source: File "/home/appuser/.local/lib/python3.11/site-packages/livekit/agents/utils/aio/itertools.py", line 47, in tee_peer item = await iterator.__anext__() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/appuser/.local/lib/python3.11/site-packages/livekit/agents/pipeline/pipeline_agent.py", line 869, in _llm_stream_to_str_generator async for chunk in stream: File "/home/appuser/.local/lib/python3.11/site-packages/livekit/agents/llm/llm.py", line 159, in __anext__ raise exc from None File "/home/appuser/.local/lib/python3.11/site-packages/livekit/plugins/openai/beta/assistant_llm.py", line 510, in _main_task self._done_future.set_result(None) asyncio.exceptions.InvalidStateError: invalid state {"pid": 138, "job_id": "AJ_Xkw6xdYmimL3"}

And sometimes this one along with the first error message:

raise self._make_status_error_from_response(err.response) from None\nopenai.BadRequestError: Error code: 400 - {'error': {'message': 'Thread thread_P5iSwRGvKxYIiEXK1GsVmDto already has an active run run_JlVSXO5euDaOpDaeY9aKyZWX.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

The weird thing is that when I swap out the assistantLLM for the regular openAI LLM then it works perfectly fine.

Here is the code of the code that isn't working:

`import asyncio import os import boto3 import json import logging import uuid from dataclasses import asdict, dataclass from typing import Any, Dict

from livekit import rtc from livekit.agents import ( AutoSubscribe, JobContext, WorkerOptions, JobProcess, WorkerType, cli, llm ) from livekit.plugins.openai.beta import ( AssistantLoadOptions, AssistantLLM, AssistantOptions, OnFileUploadedInfo, ) from livekit.agents.pipeline import VoicePipelineAgent from livekit.plugins import openai, silero from livekit.agents.multimodal import MultimodalAgent

from livekit.plugins import openai

from voicePipelineAgent import prewarm, run_voicepipeline_agent

from multiModalAgent import run_multimodal_agent

logger = logging.getLogger("dynamic-agent") logger.setLevel(logging.INFO)

def prewarm(proc: JobProcess): proc.userdata["vad"] = silero.VAD.load()

async def entrypoint(ctx: JobContext): logger.info(f"connecting to room {ctx.room.name}") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

participant = await ctx.wait_for_participant()
metadata = json.loads(participant.metadata)

agent_type = metadata.get("agent_type", "voice_pipeline")

if agent_type == "multimodal":
    run_multimodal_agent(ctx, participant)
else:
    openai_api_key = metadata.get("openai_api_key","")
    openai_model = metadata.get("openai_model","gpt-4o-mini")
    voice = metadata.get("voice","alloy")
    voice_model = metadata.get("voice_model","tts-1")
    assistant_id = metadata.get("assistant_id","<assistantID placeholder I took out to post here>")

    initial_ctx = llm.ChatContext()

    agent = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=openai.STT(api_key=openai_api_key),
        llm=AssistantLLM(
        assistant_opts=AssistantOptions(
            load_options=AssistantLoadOptions(
                assistant_id=assistant_id,
                thread_id=None
            )
        ),
        api_key=openai_api_key,
    ),
        tts=openai.TTS(model=voice_model,voice=voice,api_key=openai_api_key),
        chat_ctx=initial_ctx,
        allow_interruptions=False,
    )

    agent.start(ctx.room, participant)

    await agent.say("Hello?", allow_interruptions=False)

logger.info("dynamic agent started")

if name == "main": cli.run_app(WorkerOptions( entrypoint_fnc=entrypoint, worker_type=WorkerType.ROOM, prewarm_fnc=prewarm )) `

xavier-pare-ai commented 2 days ago

I'm also using the most recent versions of all the livekit packages. Let me know if I can provide more info.