syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
368 stars 110 forks source link

Tone transfer #13

Open switchzts opened 6 years ago

switchzts commented 6 years ago

I want to know that this model is just to learn the rhythm of the statement you provide instead of the tone. Can I use this model to imitate the tone of his speech with a single sentence?

syang1993 commented 6 years ago

The style is learned in an unsupervised way, which means that there is no constraint to make the model only focus on prosody. If you read the other Google's paper, you will find it may also learn some speaker information.

switchzts commented 6 years ago

@syang1993 Thanks for reply, Does it mean that the training data requires sentences of the same person's different rhythms? What is the data in Blizzard Challenge 2013? I am still downloading. Is it a training set for different rhythms of one speaker?

syang1993 commented 6 years ago

The Blizzard 2013 dataset is audio book data of a single speaker, which contains rich prosody. Besides, if you use neural data to train this model, the model will not learn the prosody information. It may work as traditional tacotron.

GengwangGitHub commented 5 years ago

@syang1993 hi,thansk for your nice work。as you mentioned above:"Besides, if you use neural data to train this model, the model will not learn the prosody information. It may work as traditional tacotron". What do you mean: neural data? Now, learn from your published code, my model hardly learns the prosody information and how can I next