yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
MIT License
466 stars 110 forks source link

Are there any details about the neutral to emotional conversion? #7

Closed 980202006 closed 2 years ago

980202006 commented 2 years ago

Hello, on the sample page, I saw the audio from Neutral for emotional, and was surprised by this effect because it did two things: emotion conversion and voice conversion. Are two style encoders used? And whether there are any changes in training and model structure compared to the original model?

yl4579 commented 2 years ago

It uses the exact same architecture but is trained on a different dataset. You simply need to change the dataset and will get emotion conversion for free, as training is completely unsupervised. The emotions or styles are learned automatically.

980202006 commented 2 years ago

What an excellent discriminator design, thank you!