livekit / agents

Build real-time multimodal AI applications 🤖🎙️📹
https://docs.livekit.io/agents
Apache License 2.0
4k stars 415 forks source link

energy filter for deepgram should be optional #792

Open jezell opened 1 month ago

jezell commented 1 month ago

Recently an optimization was added to avoid sending blank audio frames to deepgram. This is nice if you don't need timings...

https://github.com/livekit/agents/pull/738/files

However if you need timings, #779 is hindered by this as deepgram won't return proper timings due to energy filter

jezell commented 1 month ago

Thinking more on this, perhaps the energy filter could be used to initiate the stream / close the stream after a bit of inactivity so that the tts timings can come across the wire with an offset from the initial position, or the filter could keep track of the locations of the dropped frames to correct the timings returned from deepgram. Either way, timings are definitely important on the transcript data.

keepingitneil commented 1 month ago

Makes sense to me, made a ticket internally, but won't start working on until mid-week. Happy to accept a PR on this if you get to it first