wangkenpu / rsrgan

Robust Speech Recognition Using Generative Adversarial Networks (GAN)
MIT License
58 stars 16 forks source link

performance degrade using aishell as training data of feature-mapping #2

Open opencvbaby opened 5 years ago

opencvbaby commented 5 years ago

I tried lstm/res-lstm/gan-res-lstm, use the same configurations, all experiments got performance degrading. I don't know what's wrong. The back-end asr system is tdnn+lstm. front-end is feature mapping from aishell_train_clean+rvb to aishell_train_clean . Do you have any insights ? thank you very much !

wangkenpu commented 5 years ago

You mean LSTM front-end is worse than DNN front-end? If yes, maybe there are some "stupid " mistakes. For example, the output of the front-end is normalized. If your AM's input is raw feature, you should do reverse CMVN first before you feed the dereverberated feature to AM.

If you mean 4-layer LSTM-Res is a little worse than 4 layer LSTM or 4-layer GAN-LSTM-Res is a little worse than LSTM-Res. Maybe you should adjust some hype-parameters, such as dropout rate, initial learning rate, l2_scale and so on. Moreover, at test stage, seting "moving_average=True" maybe very helpful.

opencvbaby commented 5 years ago

my baseline is no front-end.
I did anti-gcmvn then use LDA

wangkenpu commented 5 years ago

This work is for front-end speech dereverberation and the back-end AM is fixed. If your baseline is not front-end based dereverberation, how did you design your experiments?

opencvbaby commented 5 years ago

My experiment is as follow: AM1 is trained using augmented dataset data_sp_rvb_vol as trainset; testset contains clean and rvb testset; get result1; trainset and testset both go through the feature-mapping frontend, and AM2 is trained and tested and get result2; I want result2 to be better than result1. does it make sense ?

wangkenpu commented 5 years ago

Suppose your AM2 is trained on the data that go through the feature-mapping front-end. The rvb test set results on AM2 should better than AM1. But the clean set go through the front-end, and test on AM2, the results maybe will worse than AM1.

Actually, you needn't train a new AM2, just test on AM1. It's OK.

opencvbaby commented 5 years ago

Oh, sorry i made a mistake. I do not retrain a AM2. all the testset go through the same acoustic model AM1; result1 without frontend, and result2 with the feature mapping frontend. result2 is worse than result1, even in rvb testset.

wangkenpu commented 5 years ago

If your conclusion is that feature mapping is useless in speech dereverberation. I suggest you read 《Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, “An experimental study on speech enhancement based on deep neural networks,” IEEE Signal processing letters, vol. 21, no. 1, pp. 65–68, 2014.》firstly. I have verified this framework many times on different corpora.

opencvbaby commented 5 years ago

I do think feature mapping make sense, so I am confused by my results. orz...

opencvbaby commented 5 years ago

thank you anyway ~