Closed lukeocodes closed 2 months ago
Looking into this, will have a draft PR shortly that I'll ask for your feedback on @lukeocodes
Here's the PR with a naive approach that seems to work ok (leans heavily in favor of false-positives): https://github.com/livekit/agents/pull/738
Curious where VAD plays into this. I figured only audio chunks that had voice would get passed down the pipeline to the STT plugin. I do like the idea of having a backup that blocks "silence", so this change is good, but I was wondering why it was needed in the first place...
it's tricky because VAD is also responsible for determining end of utterance/interruption. So we cannot use the same VAD for both purposes. Using a separate VAD is an option, but here we are just looking for something very crude.. where a higher false positive rate is acceptable.
Customer here. First tests look good - thanks for the quick response from both LiveKit and Deepgram!
Hey, I am writing from Deepgram.
One of our customers is using your Deepgram plugin and they've noticed that when not capturing audio, you appear to continue to send empty audio.
This results in a connection being left open, too. The impact of which is twofold. First, the connections eat into the connection concurrency limits. Second, we continue to keep billing for audio. If Deepgram is receiving audio we will process it, so we bill for this time on our compute - even though it is empty.
To keep a connection open, you can pause the audio capture and then send Deepgram a
{"type": "KeepAlive"}
message - equivalent to a websocket ping/pong - every 5-8 seconds to make sure the connection doesn't drop.