livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
973 stars 184 forks source link

Interrupts don't work while using stream adapter #468

Closed Anandsure closed 1 month ago

Anandsure commented 1 month ago

For context:

whisper_stt = openai.STT(detect_language=True)
vad=silero.VAD()
use_stt = StreamAdapter(vad=vad, stt=whisper_stt)
use_tts = elevenlabs.TTS(api_key=os.environ["ELEVEN_API_KEY"],model_id="eleven_multilingual_v2",voice=custom_11labs_voice)

When i run this setup with the voice assistant i've noticed a few things happen as listed below:

  1. Message duplication -> it's likely because there are two events "FINAL_TRANSCIPT" and "END_OF_SPEECH" both the messages are pushed into context on the stream adapter end when using openAI STT.
  2. Interrupts don't work, if i were to interrupt the agent with a new prompt, the agent continues talking finishing previous prompt and then moves on to the next message automatically (this is not an issue when i stream using deepgramSTT but there's no language detect option for this particular usecase and i NEED to use a stream adapter)
  3. I realised that we're passing the vad param explicitly in the VoiceAssistant class inspite of declaring it again to build the streamAdpater.
assistant = VoiceAssistant(
        vad=silero.VAD(),
        stt=use_stt,
        llm=gpt,
        tts=use_tts,
        chat_ctx=initial_ctx,
        fnc_ctx=DrSmilez(),
        allow_interruptions=True,
    )

Above feels unnecessary and could potentially be causing the issue?

pxz2016 commented 1 month ago

the same problem...,how to fix it? I use Azure speech

keepingitneil commented 1 month ago

We have improvements to the voice assistant coming soon. You can try it with the following versions:

livekit-agents>=0.8.0.dev3
livekit-plugins-silero>=0.6.0.dev2
livekit-plugins-azure>=0.3.0.dev2
livekit-plugins-openai>=0.7.0.dev2
livekit-plugins-elevenlabs>=0.7.0.dev2

There are some slight breaking API changes for your agent entrypoint code. You can look here for reference: https://github.com/livekit/agents/blob/dev/examples/voice-assistant/minimal_assistant.py

Could you give that a try and see if you have the same issue?

pxz2016 commented 1 month ago

@keepingitneil not found https://github.com/livekit/agents/blob/dev/examples/voice-assistant/minimal_assistant.py
the dev branch is not exits

keepingitneil commented 1 month ago

Ahh this has been recently merged and the versions have been released. Here's the main branch link:
https://github.com/livekit/agents/blob/main/examples/voice-assistant/minimal_assistant.py

And all of the above versions are the same but without the .devX suffix

pxz2016 commented 1 month ago

kitt.py

{"message": "unhandled exception while running the job task\nTraceback (most recent call last):\n File \"E:\gitee\code\livekit-agents-0.8.0\examples\_deployed\kitt\kitt.py\", line 88, in entrypoint\n chat = rtc.ChatManager(ctx.room)\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\Python312\Lib\site-packages\livekit\rtc\chat.py\", line 39, in init\n self._lp = room.local_participant\n ^^^^^^^^^^^^^^^^^^^^^^\nAttributeError: 'Room' object has no attribute 'local_participant'", "pid": 2768, "job_id": "AJ_ShiPzvtCuTsA", "timestamp": "2024-07-25T06:28:54.172341+00:00"}

pxz2016 commented 1 month ago

livekit 0.12.0.dev1 livekit-agents 0.8.0 livekit-api 0.6.0 livekit-plugins-azure 0.3.0 livekit-plugins-openai 0.6.0 livekit-plugins-silero 0.6.0 livekit-protocol 0.6.0

pxz2016 commented 1 month ago

minimal_assistant.py

{"message": "Error in _synthesize_answer_task\nTraceback (most recent call last):\n File \"C:\Python312\Lib\site-packages\livekit\agents\utils\log.py\", line 16, in async_fn_logs\n return await fn(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\Python312\Lib\site-packages\livekit\agents\voice_assistant\voice_assistant.py\", line 455, in _synthesize_answer_task\n llm_stream = self._opts.will_synthesize_assistant_reply(self, copied_ctx)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"C:\Python312\Lib\site-packages\livekit\agents\voice_assistant\voice_assistant.py\", line 79, in _default_will_synthesize_assistant_reply\n return assistant.llm.chat(chat_ctx=chat_ctx, fnc_ctx=assistant.fnc_ctx)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: LLM.chat() got an unexpected keyword argument 'chat_ctx'", "pid": 15548, "job_id": "AJ_9djEYFdBsPkC", "timestamp": "2024-07-25T06:52:59.548455+00:00"}

pxz2016 commented 1 month ago

@keepingitneil There are too many bugs, and the examples cannot run

theomonnom commented 1 month ago

Hey, it seems like the LLM plugins are not updated. Can you update livekit-plugins-openai to 0.7.0

Anandsure commented 1 month ago

@theomonnom and @keepingitneil

I'm using the latest plugins and the error still persists, interrupts don't work properly when using a stream adapter.

i even turned off preemptive synthesis, that doesn't seem to do anything really.

i also still see duplicate transcripts whenever using streamAdapter.

image
theomonnom commented 1 month ago

Hey on v0.8.0. Interruption doesn't rely on the SpeechToText at all. I'm unable to reproduce the issue, can you share more about it?

Anandsure commented 1 month ago

@theomonnom Using STT plugins that don't support streaming like openai-STT in conjunction with streamAdapter.

This is the particular scenario where interrupts fail.

Highlighted Issue

please refer the above code snippet.

theomonnom commented 1 month ago

Hey, the duplicated speeches should be fixed in livekit-agents v0.8.3. I also tested using openai with the stream adapter and I was successfully able to interrupt the agent. Note that the interruption doesn't come from the STT but VAD.