Closed Kuray107 closed 1 year ago
Hey, thanks for your interest!
train.py --base_dir /data/VoiceBank/ --batch_size 8 --gpus 4
. Hyperparameters such as spec_factor
, spec_abs_exponent
, sigma_max
etc. do not need to be explicitly specified, since the values used in the paper are given as default values. Hope that helps!
Thanks for the reply! I retrained the model with your instruction but still get a similar result on testing set (PESQ ~ 2.7). The pre-trained checkpoint you provide indeed achieves ~ 2.9 on PESQ score, so I think somehow the default training setting on my side is not optimal. The GPUs I used for training is A40, but it shouldn't make such a huge difference. Do you have any suggestions for me to check something else? And, if it is possible, would you like to re-train the model as well with default setting to confirm it will generate the correct result?
I compared the released code with the code we used for the pre-trained model checkpoint, and there was indeed a mismatch on one hyper-parameter. The pre-trained model checkpoint uses centered=True
, which should also be the default setting when training SGMSE+. We have updated the code accordingly. Thanks you for bringing this issue to our attention and helping us finding the bug in the code.
We retrained the model with the updated code on VoiceBank-Demand, and the model achieved PESQ: 2.93, ESTOI: 0.86, SI-SDR: 17.4, which is very similar to the values reported in the paper. The small deviation could be due to the stochastic nature of the method and the training procedure.
We encourage you to pull the updated code and start another training. Please let us know if it works properly now.
Hello Julius, thank you for the code update! I've re-run the experiment and this time the evaluation result is good now : ).
First of all, thank you very much for providing the code with such good quality!
I am currently trying to reproduce the result of the model on the VB-DMD dataset, which I download from the link here. The training set I used is the clean & noisy_trainset_28spk_wav, where I split all 468 files from the speaker p286 as my valid set. The command I used for training is as follows:
python train.py --base_dir VB-DMD_dataset/ --accelerator gpu --gpus 2 --batch_size 12 --no_wandb --max_epochs 160
To my surprise, the result I got on my valid set is very poor according to the tensorboard's log: The PESQ score is about 2.2, and the ESTOI value converges 0.82. However, after I test the model on the testing set, the result is much closer to the paper's result: The PESQ score is 2.73 (plus-minus 0.55), and the STOI score is 0.86 (plus-minus 0.10). Now here are my questions:
Thank you in advance for your time and help!