livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
875 stars 171 forks source link

Performance Inquiry: CPU Load with Single Audio Call on LiveKit-Agent #386

Closed vlxdisluv closed 3 weeks ago

vlxdisluv commented 2 months ago

Hello,

I am experiencing what seems to be a relatively high CPU load when running a single audio call on the LiveKit-agent. Here are the details:

I suspect that the audio encoder/decoder might be contributing to this load. By default, I believe OPUS is used for encoding/decoding.

My questions are:

  1. Is a 15-20% CPU load considered normal for a single audio call?
  2. Is there an option to switch to a simpler codec like ULaw to optimize CPU usage?
  3. Are there any other suggestions or ideas on how to reduce CPU load for audio calls?
  4. Could this be a sign of a problem, or is this load within the expected range for the worker?
  5. Attached are screenshots illustrating the CPU and memory usage during the call.

Any insights or recommendations would be greatly appreciated.

Thank you!

Attachments:

Screenshot 2024-06-21 at 13 21 36 Screenshot 2024-06-21 at 13 22 51

Code:

async def entrypoint(ctx: JobContext):
    room_metadata = ctx.room.metadata

    # Create an initial chat context with a system prompt
    initial_ctx = ChatContext(
        messages=[
            ChatMessage(
                role=ChatRole.SYSTEM,
                text=OPEN_AI_PROMPT,
            )
        ]
    )

    custom_voice = Voice(
        id="21m00Tcm4TlvDq8ikWAM",
        name="CustomName",
        category="custom",
        settings=VoiceSettings(
            stability=0.75,
            similarity_boost=1.0,
            style=1.0,
            use_speaker_boost=True
        ),
    )

    # VoiceAssistant is a class that creates a full conversational AI agent.
    # See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
    # for details on how it works.
    assistant = VoiceAssistant(
        interrupt_speech_duration=0.65,
        vad=silero.VAD(), # Voice Activity Detection
        stt=deepgram.STT(
            model="nova-2-phonecall"
        ), # Speech-to-Text
        llm=openai.LLM(model='gpt-4'), # Language Model
        tts=elevenlabs.TTS(
            voice=custom_voice
        ), # Text-to-Speech
        chat_ctx=initial_ctx, # Chat history context
        allow_interruptions=True, # Allow interruptions
    )

    # Initialize a flag to track if greeting has been sent
    greeting_sent = False

    # Define a callback function to handle participant join events
    async def on_participant_joined(participant):
        nonlocal greeting_sent
        if not greeting_sent:
            await asyncio.sleep(1)
            await assistant.say("Hey.", allow_interruptions=True)
            greeting_sent = True

    # Set the event handler for the room
    ctx.room.on('participant_connected', lambda participant: asyncio.create_task(on_participant_joined(participant)))

    # Start the voice assistant with the LiveKit room
    assistant.start(ctx.room)

    # Check if there are participants already in the room
    if len(ctx.room.participants) > 0 and not greeting_sent:
        await on_participant_joined(next(iter(ctx.room.participants.values())))

# This function is called when the worker receives a job request
# from a LiveKit server.
async def request_fnc(req: JobRequest) -> None:
    logging.info("received request %s", req)
    # Accept the job tells the LiveKit server that this worker
    # wants the job. After the LiveKit server acknowledges that job is accepted,
    # the entrypoint function is called.
    await req.accept(entrypoint)

if __name__ == "__main__":
    # Initialize the worker with the request function
    cli.run_app(WorkerOptions(request_fnc))
theomonnom commented 1 month ago

Hey, we're going to investigate this issue this week. We've also reproduced some workers where the CPU usage is unexpectedly high

theomonnom commented 3 weeks ago

Hey, sorry for the late reply. We were focused on making the VoiceAssistant more reliable. The unexpected high CPU usage is going to be fixed on the next version of livekit-plugins-silero (see this PR)