syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
368 stars 110 forks source link

Eval on soft voices #17

Open fazlekarim opened 6 years ago

fazlekarim commented 6 years ago

I noticed that regardless of how soft the reference voice is, the output is always loud. Are we really able to capture the style token if we can't detect what is loud and what is soft?