xiph / opus

Modern audio compression for the internet.
https://opus-codec.org/
Other
2.27k stars 604 forks source link

VAD #131

Open JanX2 opened 5 years ago

JanX2 commented 5 years ago

After evaluating a few VADs I could find, the one included in this project is far superior to the others. A lot of projects are using the VAD from Google’s WebRTC.

To evaluate your VAD, I built opusenc and logged the results similarly to what’s commented out here: https://github.com/xiph/opus/blob/cdaf661e8d3e85770bf06db8cff12ae6be7fa2a6/src/analysis.c#L938

After reading through the code a couple of times, several questions arose:

  1. Why are all samples resampled to a sample rate of 48 kHz?
  2. Could we work together to decouple the VAD/music detection to a greater extent than it is now?

The latter would be very helpful in using the VAD independently of Opus.

jmvalin commented 5 years ago
  1. CELT internally operates at a sampling rate of 48 kHz. Also, 48 kHz is guaranteed to work for all audio (i.e. it can represent the whole audible spectrum)
  2. I believe Google was at some point using the speech/music detector in chromium and/or webrtc, so it's possible that work's already been done

BTW, did you evaluate the VAD that's in RNNoise? Also, care to share your results (and methodology)?

JanX2 commented 5 years ago

I just empirically evaluated it for use with the audio I want to use it with: long-form speech recordings occasionally containing music. No statistical evaluation sadly. That’s be bond my expertise.

  1. makes sense.
  2. Interesting! I’ll have to dig into that. Any references you have for me?

I did evaluated RNNoise back when it was new. It didn’t work well with my material. Occasionally had a look, but did not see much movement there. Anything I have missed?

xinkez commented 5 years ago

@jmvalin I find that the vad used in the opus codes is great. I want to change the frame duration from 20ms to 16ms, but I don't know the principle of tonality_analysis function in the file of . There are many coefficients, for example, <band_log2[b+1] = .5f1.442695f(float)log(E+1e-10f);>. Could you pls share me some ideas about how to change the codes? or some papers? Thank you in advance.

alokprasad commented 4 years ago

@JanX2 Do you have standalone repo of VAD from opus tree ?

JanX2 commented 4 years ago

@alokprasad No. Just something that’s hacked together to play around with it.