livekit / agents

Build real-time multimodal AI applications šŸ¤–šŸŽ™ļøšŸ“¹
https://docs.livekit.io/agents
Apache License 2.0
697 stars 123 forks source link

Duplicated agent responses (LLM inference + TTS audio) #323

Open andrewjhogue opened 1 month ago

andrewjhogue commented 1 month ago

I've noticed that occasionally the agent will generate two distinct responses (LLM inference and TTS audio) for the same user input.

Interestingly, the second LLM inference isn't generated until after the first TTS audio is completed.

Usually, the second LLM inference + response will be generated using the entire user input, with the first inference being generated using a fraction of it (eg, it doesn't always seem to wait for the user to finish, or handle an interruption cleanly).

Another variant of this I've seen is that sometimes the LLM inference seems to get "caught" - eg, the AI will respond to the previous question I asked instead of the current one, but only once I ask the following question.

Will add logs here as repro's happen locally.

Am using this local setup on Macbook Chrome:

    assistant = VoiceAssistant(
        vad=silero.VAD(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o"),
        tts=elevenlabs.TTS(voice=DEFAULT_VOICE),
        chat_ctx=initial_ctx,
        fnc_ctx=fnc_ctx,
        allow_interruptions=True,
        debug=True
    )
ubeytd commented 1 month ago

Exactly, I encountered this as well. Then the agent responds, "Seems like there is a misunderstanding," because it responds to the previous question.

theomonnom commented 1 month ago

Hey, can it be because two agents are in the same room? There is a known bug where two agents connect to the same room when using the "connect --room my_room" cmd

andrewjhogue commented 1 month ago

Hey @theomonnom - thanks for the response. I don't think so, right now my code generates a unique room ID every time, and am also setting max_participants = 2 on room creation. So in theory this shouldn't be possible?

davidzhao commented 1 month ago

@andrewjhogue @ubeytd can you repro on 0.7.1?

ubeytd commented 1 month ago

@andrewjhogue @ubeytd can you repro on 0.7.1?

Yes, I have experienced this

Another variant of this I've seen is that sometimes the LLM inference seems to get "caught" - eg, the AI will respond to the previous question I asked instead of the current one, but only once I ask the following question.

e.g: I speak something to AI. (Sentence 1) It completely ignores it, silence.

Then when I continue to speak (Sentence 2), it is replying to Sentence 2.

Then out of nowhere, it is replying to (Sentence 1), then it gets confused and says "Seems like there is a misunderstanding."

ubeytd commented 1 month ago

@davidzhao just tested on livekit-agents~=0.7.2 issue still persists.

Could it be related to Elevenlabs latency? Can it be that TTS is arriving late and the voice assistant is pushing the message to the chat context later when it's not relevant anymore?

Not sure if it supports the theory, but conversation feels smoother with OpenAI TTS to me.

andrewjhogue commented 2 weeks ago

@davidzhao I can confirm the issue still exists on livekit-agents~=0.7.2 as well.

I'm also now seeing the below sequence:

  1. User Speaks
  2. Agent Starts Synthesizing Response
  3. User Interrupts Agent with Second Speech
  4. Agent Responds to First Message
  5. Agent then sends a second, separate response to the Interruption Speech.
willsmanley commented 1 week ago

I'm experiencing this as well, following this thread...

as an aside, I know that Retell is using livekit cloud for AI agent phone calls and web calls (but not this livekit-agents library), yet whatever they're doing seems to avoid this issue. intuitively then, i would expect this issue to be caused by some logic on the livekit-agents since they are using all of the same providers, rather than elevenlabs latency as suggested above. see: https://docs.retellai.com/api-references/create-phone-call

davidzhao commented 1 week ago

confirming this is a bug in the current voice assistant code. we are making a few improvements in the upcoming version that should resolve this. a beta build will be cut early next week. I'll share a link here for everyone to try.

willsmanley commented 3 days ago

@davidzhao yall are either crazy or amazing for releasing such an amazingly useful library as open source. i will be happy to help test out in any way. i'm preparing to launch an AI math teacher tablet app backed by your infra and open source libs. owe yall a huge debt of gratitude.