yxlu-0102 / MP-SENet

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
MIT License
267 stars 40 forks source link

Questions with regards to reproducing training and inference result #30

Closed lhy000001 closed 2 months ago

lhy000001 commented 2 months ago

Dear authors,

Thank you for the great paper and for providing open source code. I would hope to clarify some detail with regards to your training and inference/evaluation.

inference/evaluation

I used your model checkpoint and the VB-demand dataset which you shared through google drive to perform evaluation on the VB-demand test set. Here are the results that I obtained: pesq: 3.4957 csig: 4.65187 cbak: 3.86279 covl: 4.13774

The results seems to be slightly different from the results that you shared in #9 , which are: pesq: 3.4957 csig: 4.72751 cbak: 3.95033 covl: 4.22494

I am unsure what causes the differences in the csig, cbak, covl scores, and wonder if you may have any clues about it? For your information, I used librosa to load the test audios at 16k sampling rate, and used pysepm.composite to compute these scores. My pysepm version is 0.1.

Training,

In section 3.1 of your interspeech paper, it was written that " The learning rate was set initially to 0.0005 and halved every 30 epoch". With regards to this statement, may I clarify if you stopped training after every 30 epochs, halved the learning rate in the config file and then resumed training?

May I also clarify if the checkpoint you provided is the best checkpoint or the last checkpoint during the 100 epochs of training?

Thank you for reading and I hope to hear back from you.

yxlu-0102 commented 2 months ago

Hi,

For your first question, the tool I used in the code to calculate the objective metrics was directly inherited from CMGAN. I also noticed that its results differ slightly from those calculated by the pysepm package, but to ensure a fair comparison with CMGAN, I used the tools they provided.

For your second question, in the conference version, we halved the learning rate every 30 epochs. However, in subsequent work, we found that using an exponential decay method yields better results, so the method provided in this repository is the latter.

Lastly, the checkpoint I put in this repository is the best checkpoint.

lhy000001 commented 2 months ago

Thank you very much for addressing my questions.