VAD - Githubissues

xiph / opus

Modern audio compression for the internet.

https://opus-codec.org/

Other

2.27k stars 604 forks source link

VAD #131

Open JanX2 opened 5 years ago

JanX2 commented 5 years ago

After evaluating a few VADs I could find, the one included in this project is far superior to the others. A lot of projects are using the VAD from Google’s WebRTC.

To evaluate your VAD, I built opusenc and logged the results similarly to what’s commented out here: https://github.com/xiph/opus/blob/cdaf661e8d3e85770bf06db8cff12ae6be7fa2a6/src/analysis.c#L938

After reading through the code a couple of times, several questions arose:

Why are all samples resampled to a sample rate of 48 kHz?
Could we work together to decouple the VAD/music detection to a greater extent than it is now?

The latter would be very helpful in using the VAD independently of Opus.

jmvalin commented 5 years ago

CELT internally operates at a sampling rate of 48 kHz. Also, 48 kHz is guaranteed to work for all audio (i.e. it can represent the whole audible spectrum)
I believe Google was at some point using the speech/music detector in chromium and/or webrtc, so it's possible that work's already been done

BTW, did you evaluate the VAD that's in RNNoise? Also, care to share your results (and methodology)?

JanX2 commented 5 years ago

I just empirically evaluated it for use with the audio I want to use it with: long-form speech recordings occasionally containing music. No statistical evaluation sadly. That’s be bond my expertise.

makes sense.
Interesting! I’ll have to dig into that. Any references you have for me?

I did evaluated RNNoise back when it was new. It didn’t work well with my material. Occasionally had a look, but did not see much movement there. Anything I have missed?

xinkez commented 5 years ago

@jmvalin I find that the vad used in the opus codes is great. I want to change the frame duration from 20ms to 16ms, but I don't know the principle of tonality_analysis function in the file of . There are many coefficients, for example, <band_log2[b+1] = .5f1.442695f(float)log(E+1e-10f);>. Could you pls share me some ideas about how to change the codes? or some papers? Thank you in advance.

alokprasad commented 4 years ago

@JanX2 Do you have standalone repo of VAD from opus tree ?

JanX2 commented 4 years ago

@alokprasad No. Just something that’s hacked together to play around with it.