secretsauceai / secret_sauce_ai

Secret Sauce AI: a coordinated community of tech minded AI enthusiasts
Apache License 2.0
65 stars 5 forks source link

How to do silence detection in python well? #18

Open secretsauceai opened 3 years ago

secretsauceai commented 3 years ago

Having better silence detection would aid in chopping up audio files containing wake word information to reduce false positives.

Currently to make sure individual files capture only aspects of the wake word recordings, I chop them by n +2, where n is the number of syllables in the wake word. This works, however it misses a lot more combinations of sounds (ie Jarvis in 'hey Jarvis' would not be completely contained).

I tried some experiments with silence removal myself based on this stackoverflow question. However the threshold must be manually provided, I couldn't find a satisfactory threshold, perhaps a dynamic threshold is needed?

Here is an interesting code snippet to check if it works better.

Solution

However I think for now, the easiest solution is to add a feature into the wake word recording python script to let people add in such stuff themselves. This level of recording (such as using 'Jarvis' as a not-wake audio) was impossible on earlier models before the data generation methods were perfected.

This is the easiest and most viable solution. But it would be cool to be able to chop up audio files automatically for syllables and even more complex sounds in the future.

Example

I want my wake word 'hey Jarvis' to work, but not also for just 'Jarvis'. Therefore I add in when prompted for extra input on not-wake-words 'Jarvis' with 2 recordings (one for training one for test, which will be generated further anyway).

JarbasAl commented 2 years ago

see how we do it in ovos https://github.com/OpenVoiceOS/ovos-core/tree/dev/mycroft/listener

mic.py and silence.py are the relevant files

basically between using noise threshold with some magic numbers and an optional VAD model we get pretty good silence/speech detection

re VAD plugins, silero seems to work best according to benchmarks but is painful to install at times, webrtcvad seems to work about anywhere and i personally don't notice a difference in performance/accuracy

i think you should be able to adapt the silence.py file and automatically have support for our plugins, if you adapt mic.py you get support for the whole wake word stack. Those components are pretty much standalone and you can add ovos-core to requirements.txt and import them directly