jzi040941 / PercepNet

Unofficial implementation of PercepNet: A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech
BSD 3-Clause "New" or "Revised" License
333 stars 92 forks source link

about ERB band #25

Open rafle0 opened 3 years ago

rafle0 commented 3 years ago

Hello, Mr. Noh

I found that PercepNet used triangular filter in ERB band, and you already caught that point and modified your repo. (I found this when I read the paper on Personalized PercepNet. In section 2, the author said that they used triangular filters in PercepNet.)

I think that we can get values of ERB subbands from the figure in this post: https://www.amazon.science/blog/how-amazon-chimes-challenge-winning-noise-cancellation-works

My estimation is : eband5ms[33] = {2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 23, 26, 29, 33, 37, 41, 46, 52, 58, 64, 71, 81, 91, 103, 117, 133, 154, 177, 206, 241, 283, 336, 400} In this array, 1 means 50Hz. It can be slightly different from the truth, but it may work properly. I started from two assumptions to get this array.

  1. I think that each subband is a multiple of 50Hz : You can guess this fact from the implementation of bark scale in RNNoise. He used slightly different version of bark scale so every subband is a multiple of 50Hz and every triangular filter has min/max on borders of subbands.

  2. Every subband is at least 100Hz, to avoid containing just single frequency bin.

Finally, I counted some dots for one hour and got this values. (from 0 to 5000Hz, there are about 280 dots) I hope that it can help you.

Regards, Jaeyoung

jzi040941 commented 2 years ago

Hi, Jaeyoung

I've checked the ERB band and your second assumption was not applied before now I changed it and the result is also similar to yours. [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 31, 36, 41, 48, 56, 65, 75, 86, 99, 115, 132, 152, 175, 201, 230, 265, 304, 349, 400]

your second assumption is applied to erbband.h L71~75, had error before now I fixed it.

         //impose mininum 100hz(2 nfft)
          for(int k=0; k<N; k++){
            if(nfftborder[k+1]-nfftborder[k]<2)
              nfftborder[k+1]+=(2-(nfftborder[k+1]-nfftborder[k]));  
          }

I'm currently using erb_band->nfftborder instead which is automatically generated by erbband class initializer, In this repo, eband5ms is not used anymore Thank you for your estimation that makes it easy to find an error and fix it.