Despite the fix to flush Deepgram transcripts, we are still (though rarely) seeing DG send only INTERIM transcripts without a FINAL.
If this is the case, then interrupt_min_words should be compared to self._transcribed_text, not self._transcribed_interim_text. Otherwise, we may interrupt the agents response without it ever being prompted to respond to what it was interrupted by. This results in the agent appearing to freeze
Proposed Solution 1
This line in VoicePipelineAgent:
text = self._transcribed_interim_text or self._transcribed_text
Should instead be:
text = self._transcribed_text
I would be happy to implement this if Livekit agrees
The issue with this solution is that interruptions will appear to lag a bit from what the user actually says, so not sure if this is worth the tradeoff
Proposed Solution 2
Another solution could be having an internal timer of some sort that will use the _transcribed_interim_text if we never get back a FINAL event after a certain amount of time. I would not be as comfortable implementing this but can give it a try. I don't see a downside to this approach in terms of user experience
Despite the fix to flush Deepgram transcripts, we are still (though rarely) seeing DG send only
INTERIM
transcripts without aFINAL
.If this is the case, then
interrupt_min_words
should be compared toself._transcribed_text
, notself._transcribed_interim_text
. Otherwise, we may interrupt the agents response without it ever being prompted to respond to what it was interrupted by. This results in the agent appearing to freezeProposed Solution 1
This line in
VoicePipelineAgent
:Should instead be:
I would be happy to implement this if Livekit agrees The issue with this solution is that interruptions will appear to lag a bit from what the user actually says, so not sure if this is worth the tradeoff
Proposed Solution 2
Another solution could be having an internal timer of some sort that will use the
_transcribed_interim_text
if we never get back aFINAL
event after a certain amount of time. I would not be as comfortable implementing this but can give it a try. I don't see a downside to this approach in terms of user experience