Closed starrynightdream closed 1 month ago
Well, the time loss and the STFT consistency loss are different.
In our previous experiments, we found that time loss only enhanced the SNR metric but had side effects on perception-related metrics, so we removed the time loss in the journal version.
The STFT consistency loss is intended to reduce the inconsistency in the STFT domain caused by the STFT transformation. Adding this loss can further enhance the perceptual quality of the speech.
Thank you. I get it. I will change the code.
In Paper METHODOLOGY B has a STFT Consistency Loss.
The current code utilizes the MAE loss between synthesized audio and clear audio, known as Time Loss.
Do they have the same effect? Or would the current implementation version of the code be more effective?
The result I obtained from VoiceBank+DEMAND: 😇![image](https://github.com/yxlu-0102/MP-SENet/assets/45537018/86092def-7346-4a08-84bc-312baeae467b)