syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
368 stars 110 forks source link

can we synthesis speaker-A's tone with speaker-B's prosody? #41

Open niu0717 opened 4 years ago

niu0717 commented 4 years ago

when i read gst paper, i found it contains not only the token but also the tone of the speaker. In other word, can we separate prosody from the ref audio as much as possible?