yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
MIT License
466 stars 110 forks source link

Any reason why using consistency regularization? #26

Closed MingjieChen closed 2 years ago

MingjieChen commented 2 years ago

Hello,

I found the consistency regularization is quite important for the training process, however you did not mention it in the paper. Could you please give some reason about why using it?

Thanks

Charlottecuc commented 2 years ago

@MingjieChen I think it's for data augmentation. Check: https://zhuanlan.zhihu.com/p/378269256 (Balanced Consistency Regularization (bCR) )

yl4579 commented 2 years ago

It is important only for datasets with undesirable sound quality. For example, you don't need it for the JVS dataset (Japanese) because it was recorded in a noise-free studio, but VCTK on the other hand was recorded in the speaker's home. This is to make sure the model does not overfit the background noise.