Open andrewjhogue opened 1 month ago
Exactly, I encountered this as well. Then the agent responds, "Seems like there is a misunderstanding," because it responds to the previous question.
Hey, can it be because two agents are in the same room? There is a known bug where two agents connect to the same room when using the "connect --room my_room" cmd
Hey @theomonnom - thanks for the response. I don't think so, right now my code generates a unique room ID every time, and am also setting max_participants = 2 on room creation. So in theory this shouldn't be possible?
@andrewjhogue @ubeytd can you repro on 0.7.1?
@andrewjhogue @ubeytd can you repro on 0.7.1?
Yes, I have experienced this
Another variant of this I've seen is that sometimes the LLM inference seems to get "caught" - eg, the AI will respond to the previous question I asked instead of the current one, but only once I ask the following question.
e.g: I speak something to AI. (Sentence 1) It completely ignores it, silence.
Then when I continue to speak (Sentence 2), it is replying to Sentence 2.
Then out of nowhere, it is replying to (Sentence 1), then it gets confused and says "Seems like there is a misunderstanding."
@davidzhao just tested on livekit-agents~=0.7.2
issue still persists.
Could it be related to Elevenlabs latency? Can it be that TTS is arriving late and the voice assistant is pushing the message to the chat context later when it's not relevant anymore?
Not sure if it supports the theory, but conversation feels smoother with OpenAI TTS to me.
@davidzhao I can confirm the issue still exists on livekit-agents~=0.7.2
as well.
I'm also now seeing the below sequence:
I'm experiencing this as well, following this thread...
as an aside, I know that Retell is using livekit cloud for AI agent phone calls and web calls (but not this livekit-agents library), yet whatever they're doing seems to avoid this issue. intuitively then, i would expect this issue to be caused by some logic on the livekit-agents since they are using all of the same providers, rather than elevenlabs latency as suggested above. see: https://docs.retellai.com/api-references/create-phone-call
confirming this is a bug in the current voice assistant code. we are making a few improvements in the upcoming version that should resolve this. a beta build will be cut early next week. I'll share a link here for everyone to try.
@davidzhao yall are either crazy or amazing for releasing such an amazingly useful library as open source. i will be happy to help test out in any way. i'm preparing to launch an AI math teacher tablet app backed by your infra and open source libs. owe yall a huge debt of gratitude.
I've noticed that occasionally the agent will generate two distinct responses (LLM inference and TTS audio) for the same user input.
Interestingly, the second LLM inference isn't generated until after the first TTS audio is completed.
Usually, the second LLM inference + response will be generated using the entire user input, with the first inference being generated using a fraction of it (eg, it doesn't always seem to wait for the user to finish, or handle an interruption cleanly).
Another variant of this I've seen is that sometimes the LLM inference seems to get "caught" - eg, the AI will respond to the previous question I asked instead of the current one, but only once I ask the following question.
Will add logs here as repro's happen locally.
Am using this local setup on Macbook Chrome: