sky1456723 / Pytorch-MBNet

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK
61 stars 4 forks source link

Should ReLU be used in mean_net_dnn/bias_net_dnn? #3

Open unilight opened 3 years ago

unilight commented 3 years ago

Hi, thanks for the implementation! I doubt if it is reasonable to use ReLU in both mean_net_dnn and bias_net_dnn. It might be reasonable for mean_net_dnn because the MOS value range is {1,...,5}, but I think we should allow negative bias values. Maybe tanh would be a better choice?

sky1456723 commented 3 years ago

Hi, unilight. Thanks for the issue. I think that it is very reasonable to replace the activation in bias_net_dnn. It would be wonderful if you could show the results w/wo the replacement here. (I have not tried to replace it.) When I implement this model, I have sent emails to the authors of the paper to ask about the activation function. They say that they use relu. So I insert the activation function after every blocks in the paper. Maybe I should not insert the activation at the output of bias_net_dnn.

unilight commented 3 years ago

Hi @sky1456723, Thanks for the reply! I can try to replace it. I was wondering how you split the VCC2018 dataset? From the original MOSNet implementation: https://github.com/lochenchou/MOSNet/blob/master/train.py#L72-L78, a random seed was set to ensure a deterministic split. Are the split files (data/<training/valid/testing>.txt) generated following the same procedure?

sky1456723 commented 3 years ago

Sorry for the late response. The split files in the current commit are generated randomly. I still need some time to find the original scripts that generate the split files to provide them. If you find that using different train/dev/test split leads to very different results, please also let me know. Thank you.