xiph / rnnoise

Recurrent neural network for audio noise reduction
BSD 3-Clause "New" or "Revised" License
3.97k stars 890 forks source link

Understanding the per-frequency gain applied per band #150

Open sevagh opened 3 years ago

sevagh commented 3 years ago

Hello, I'm having a hard time understanding the function interp_band_gain.

From the paper it says that the gain applied to the FFT at each frequency bin is the sum of all the amplitudes of the bands to which that frequency belongs.

In code it looks like:

  if (!silence) {
    compute_rnn(&st->rnn, g, &vad_prob, features);
    pitch_filter(X, P, Ex, Ep, Exp, g);
    for (i=0;i<NB_BANDS;i++) {
      float alpha = .6f;
      g[i] = MAX16(g[i], alpha*st->lastg[i]);
      st->lastg[i] = g[i];
    }
    interp_band_gain(gf, g);
#if 1
    for (i=0;i<FREQ_SIZE;i++) {
      X[i].r *= gf[i];
      X[i].i *= gf[i];
    }
#endif

The code for interp_band_gain is:

void interp_band_gain(float *g, const float *bandE) {
  int i;
  memset(g, 0, FREQ_SIZE);
  for (i=0;i<NB_BANDS-1;i++)
  {
    int j;
    int band_size;
    band_size = (eband5ms[i+1]-eband5ms[i])<<FRAME_SIZE_SHIFT;
    for (j=0;j<band_size;j++) {
      float frac = (float)j/band_size;
      g[(eband5ms[i]<<FRAME_SIZE_SHIFT) + j] = (1-frac)*bandE[i] + frac*bandE[i+1];
    }
  }
}

To my knowledge the Bark frequency/critical bands are not overlapping. So how can any 1 frequency belong to more than 1 band?

sevagh commented 3 years ago

Why would I not just do (pseudocode):

band_gains = float[24];

for (j = 0; j < nfft; ++j)
    float frequency_bin = j * sample_rate/nfft;
    if (band_0_left < frequency_bin < band_0_right)
        fft[j] *= band_0_gain;
    else if (band_1_left < frequency_bin < band_1_right)
        fft[j] *= band_1_gain;
    ...
sevagh commented 3 years ago

Where do these magic values come from?

static const opus_int16 eband5ms[] = {
/*0  200 400 600 800  1k 1.2 1.4 1.6  2k 2.4 2.8 3.2  4k 4.8 5.6 6.8  8k 9.6 12k 15.6 20k*/
  0,  1,  2,  3,  4,  5,  6,  7,  8, 10, 12, 14, 16, 20, 24, 28, 34, 40, 48, 60, 78, 100
};

This looks inherited from Opus codebases. Looks like some transform of Bark band frequency edges to DFT indices?

How can I create my own.

guishengzhang commented 3 years ago

For all I know fromm the paper, the band split is inherited from Opus codec, and it is just a approximation of the Bark scale.

guishengzhang commented 3 years ago

"Rather than rectangular bands, we use triangular bands, with the peak response being at the boundary between bands. "

Adjacent bands should be overlapped.