xiph / rnnoise

Recurrent neural network for audio noise reduction
BSD 3-Clause "New" or "Revised" License
3.97k stars 890 forks source link

What does y_train looks like? #136

Open majianjia opened 4 years ago

majianjia commented 4 years ago

Hi,

I wanted to replicate RNNoise but a lighter version to be able to run on microcontrollers. So I will need to build and train a new model.

From my understanding, the input is the BFCC of noisy speech plus multiple features. Then the NN will learn the 22 gains of each band. Is the y_train the truth of gains for these 22 bands? If it is, how do I calculate the truth gains out of the signals?

I try to train the NN using MS dataset as suggested by https://github.com/xiph/rnnoise/issues/116 In the MS dataset, I got both clean speech and noisy speech with noise mixed in with 0~40db. I first tried to put 22 noisy BFCC as input, then use the 22 clean BFCC as y_train. Apparently it fails. After that, I realise what we need is actually the gains but then I am not sure how to generate gains out from the data I have.

Any comment will help, Thanks