Closed hbwu-ntu closed 1 year ago
Hi, I can't reproduce his results either. How much PESQ can you reproduce so far?
Very low, only 3.2. Far more lower than the paper. By the way, how much PESQ can you reproduce?
I am trained according to the parameters published in the code, and the data processing is in accordance with the way in the TSTNN code. At 50epoch, the PESQ is 3.24. Then gen_loss starts to rise, and the model does not converge.
Thanks for making the implementation details clear. It seems our results are similar. Do you use the same way as the CMGAN github repo to downsample the data to 16k?
No, I downsampled the original Voice Bank+DEMAND data to 16K. I think the results in the paper cannot be reproduced because of the hyperparameter and the learning rate.
Apart from the learning rate, which other hyperparameters do you consider crucial for reproducing the results? I recall that the authors address the learning rates in the paper. But from my experience, I can not reproduce the results using that learning rate.
I think the learning rate in this code is more suitable for speech separation, such as the classic conv-tasnet. I think the hyperparameter of loss is also very important, but I don't know how much weight to assign to each loss can achieve the optimal value.
Sorry for that, but this was never an issue for us and also we didn't get this complain on PESQ. Did you try to use the checkpoint in src/best_ckpt?
Hi! Your paper and code are excellent! I have learned a lot about speech enhancement from the paper, and I find your code to be very well-structured and clear. Thank you so much!
I can not reproduce the results in your paper. I just want to know some settings to run the experiments
- about the loss_weights, do you use the setting in your paper or the setting in your github?
- about the epoch number, do you use 50 in the paper or 120 in the github repo?
- how do you select the final model for inference?
- why you set the utterance length as 16 * 16000 during testing
- How do you downsample the audio, could you share the script?
Thank you very much for your warm and detailed response. I will follow the instructions provided and make an effort to reproduce the results:
By the way, I have some follow-up questions:
Variable length would avoid any normalization issues when splitting the tracks and it is much more convenient than padding tracks to a predefined maximum length or splitting tracks exceeding this length. No actually not the results in the paper are from the best checkpoint not multiple trials, however, your point is a very interesting insight and should be involved in our future studies. Thanks!
However, it is worth mentioning that based on several training trials the results are somehow consistent.
Hi! Your paper and code are excellent! I have learned a lot about speech enhancement from the paper, and I find your code to be very well-structured and clear. Thank you so much!
I can not reproduce the results in your paper. I just want to know some settings to run the experiments