yxlu-0102 / MP-SENet

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
MIT License
267 stars 40 forks source link

About Time Loss And STFT Consistency Loss #34

Closed starrynightdream closed 1 month ago

starrynightdream commented 1 month ago

In Paper METHODOLOGY B has a STFT Consistency Loss.

The current code utilizes the MAE loss between synthesized audio and clear audio, known as Time Loss.

Do they have the same effect? Or would the current implementation version of the code be more effective?

The result I obtained from VoiceBank+DEMAND: 😇 image

yxlu-0102 commented 1 month ago

Well, the time loss and the STFT consistency loss are different.

In our previous experiments, we found that time loss only enhanced the SNR metric but had side effects on perception-related metrics, so we removed the time loss in the journal version.

The STFT consistency loss is intended to reduce the inconsistency in the STFT domain caused by the STFT transformation. Adding this loss can further enhance the perceptual quality of the speech.

starrynightdream commented 1 month ago

Thank you. I get it. I will change the code.