yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
MIT License
466 stars 110 forks source link

Why the norm consistency loss can help to preserve the speech/silence intervals of generated samples and thus decrease the noises? #24

Closed Charlottecuc closed 2 years ago

yl4579 commented 2 years ago

Because silence has a lower norm (i.e. volume) than speech part, and hence the model learns to distinguish between noise and speech by introducing this loss term.