jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.53k stars 1.22k forks source link

vits is awsome! Can vits train with emotional dataset? #98

Open akfheaven opened 1 year ago

akfheaven commented 1 year ago

I've tried normal speech dataset and generated very natual voice. But how about training with emotional dataset? any one have a try?

DoubleClickHong commented 1 year ago

I was wondering about the same thing, @akfheaven curious if you've tried that.

nikich340 commented 1 year ago

It's possible, but probably you should mark input with some special symbols (at the end?) Like it happens when we make "[text]?" or "[text]!" instead of usual "[text]."