This is the unofficial implementation of MFNet, from paper''a Mask Free Neural Network for Monaural Speech Enhancement''
arxiv:https://arxiv.org/abs/2306.04286
I appreciate the guidance and assistance from the author. After the correction following our discussion:
1.The initial learning rate is 3e-4, correcting the value from 0.0034 in the paper.
2.The features input to the network are compressed spectra, i.e., input = sign(stdct) * sqrt(stdct).
3.DCT transformation without normalization.
I put the key code of STDCT, which may be useful for you.
This experiment did not utilize the warm-up strategy mentioned in the paper. Instead, following the author's recommendation, the training parameters were set as follows:
Performance of MFNet on the Voicebank+Demand (VCTK) test set:
PESQ | STOI | SI-SNR | |
---|---|---|---|
Noisy | 1.9799 | 92.11 | 8.4474 |
MFNet | 3.0141 | 94.56 | 18.7835 |
Additionally, these are the best results on the test set obtained during the first 100 epochs of training.