thewh1teagle / vad-rs

Speech detection using silero vad in Rust
https://crates.io/crates/vad-rs
10 stars 1 forks source link

Loudness normalization options #2

Open thisislvca opened 1 month ago

thisislvca commented 1 month ago

Hey man! Great job with the library, been super duper helpful.

I've been running some tests with live speech, which will be my use case, and I've seen that oftentimes when the audio gets normalized because it's too loud, for some reason the speech detection gets screwed up.

I think being able to choose just to make the sound more loud when it's not enough to be recognized would be a nice addition :)

thisislvca commented 1 month ago

Quick correction: the normalization on my end breaks the detection of every sample. The VAD does detect some speech, but not in more than one sample... I tried different mics and different loudness. I'm running on macOS 14.7 on an M1 MacBook.

thewh1teagle commented 1 month ago

Hey I agree the vad doesn't work perfect. I'm not sure why and there must be some issue we didn't found yet. You can call the loudness normalization conditionally, I don't call it automatically - it's up to you. The original silero vad implementation is in https://github.com/snakers4/silero-vad/tree/master/examples/rust-example Maybe we've missed something important from there.

thisislvca commented 1 month ago

Makes sense, agree! Thanks for the reply, will update this if I find anything new.

thisislvca commented 1 month ago

Hey! Playing around with the threshold helps. It'd be cool to have the option to edit the activation threshold however you like.

I also found the "Unknown" result pretty confusing - usually, you have the threshold, and if you want to have multiple thresholds you do it yourself...

Curious to hear your thoughts :)