Open fippo opened 10 years ago
I agree with this, do you have something new regarding a more efficient vad technic, involving frequency ranges? I read human voice is often between [100Hz - 1000Hz]
http://webee.technion.ac.il/Sites/People/IsraelCohen/Publications/CSL_June2013.pdf is what I would currently prefer (not the dominating speaker aspect, but the others). But time...
That's seems too complicated for me to help. I mean, I'm a developer, not a PHD researcher in vocal recognition. However if you already know about some implementations of a good VAD script, in what ever language on earth it is, I can give it a try.
I suspect the current max(freq) strategy for is somewhat unstable, since it just takes the maximum and ignores the frequency.
getByteTimeDomainData may enable us to calculate root mean square according to http://tools.ietf.org/html/rfc6465#appendix-A.1 If that doesn't work... we'll have to figure out something with the FFT data.