Deepgram plugin: connections stay open

livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹

https://docs.livekit.io/agents

Apache License 2.0

4k stars 412 forks source link

Deepgram plugin: connections stay open #736

Closed lukeocodes closed 2 months ago

lukeocodes commented 2 months ago

Hey, I am writing from Deepgram.

One of our customers is using your Deepgram plugin and they've noticed that when not capturing audio, you appear to continue to send empty audio.

This results in a connection being left open, too. The impact of which is twofold. First, the connections eat into the connection concurrency limits. Second, we continue to keep billing for audio. If Deepgram is receiving audio we will process it, so we bill for this time on our compute - even though it is empty.

To keep a connection open, you can pause the audio capture and then send Deepgram a {"type": "KeepAlive"} message - equivalent to a websocket ping/pong - every 5-8 seconds to make sure the connection doesn't drop.

keepingitneil commented 2 months ago

Looking into this, will have a draft PR shortly that I'll ask for your feedback on @lukeocodes

Here's the PR with a naive approach that seems to work ok (leans heavily in favor of false-positives): https://github.com/livekit/agents/pull/738

paulingalls commented 2 months ago

Curious where VAD plays into this. I figured only audio chunks that had voice would get passed down the pipeline to the STT plugin. I do like the idea of having a backup that blocks "silence", so this change is good, but I was wondering why it was needed in the first place...

davidzhao commented 2 months ago

it's tricky because VAD is also responsible for determining end of utterance/interruption. So we cannot use the same VAD for both purposes. Using a separate VAD is an option, but here we are just looking for something very crude.. where a higher false positive rate is acceptable.

jmacAJ commented 2 months ago

Customer here. First tests look good - thanks for the quick response from both LiveKit and Deepgram!