Are there any details about the neutral to emotional conversion？

yl4579 / StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

MIT License

466 stars 110 forks source link

Are there any details about the neutral to emotional conversion？ #7

Closed 980202006 closed 2 years ago

980202006 commented 2 years ago

Hello, on the sample page, I saw the audio from Neutral for emotional, and was surprised by this effect because it did two things: emotion conversion and voice conversion. Are two style encoders used? And whether there are any changes in training and model structure compared to the original model？

yl4579 commented 2 years ago

It uses the exact same architecture but is trained on a different dataset. You simply need to change the dataset and will get emotion conversion for free, as training is completely unsupervised. The emotions or styles are learned automatically.

980202006 commented 2 years ago

What an excellent discriminator design, thank you!