Open sjpritchard opened 6 years ago
Yeah, actually you're not the first person to suggest this in regards to Opus. Here's a mailing list discussion I found as an example.
In principle it should be easy to expose VAD state; you'd just add a getter in OpusEncoder.cs that pulls the value out of the SILK state. A few caveats come with this approach though:
There are a few alternatives as well
Thanks for the suggestions - I'll take a look. I was also looking through the Opus RFC and wondered if I might be able to directly inspect each encoded frame, as it appears that the Silk layer of each frame has a VAD flag set. If each frame is a constant time period, I might be use this flag as a counter.
Thanks for the suggestions - I'll take a look. I was also looking through the Opus RFC and wondered if I might be able to directly inspect each encoded frame, as it appears that the Silk layer of each frame has a VAD flag set. If each frame is a constant time period, I might be use this flag as a counter.
Hello @sjpritchard
Were u successful. I aslo want to do the same thing. Can you tell how did you acheive it Regards,
Hello,
any news on that? Thank you for your help!
Best regards, julian-w
I would like to use Concentus in an app that does Speech-To-Text conversion. I need to be able to detect the end of sentences by monitoring voice activity and identifying segments of speech terminated by periods of silence. I know Opus has Voice Activity Detection, but looking through the Concentus source code, VAD seems to only be used in internal classes for DTX, with no exposed public classes/methods. Ideally I'd be able to poll the encoder and get a count of recent consecutive silence frames, then capture the sentence after a the silence frame threshold has been reached, and then submit that sentence to the STT engine.
Is there any way to get access to the built-in VAD status on the encoder? Or any other way to achieve what I want to achieve?
And thankyou for this library!!!! :)