Open fazlekarim opened 6 years ago
I noticed that regardless of how soft the reference voice is, the output is always loud. Are we really able to capture the style token if we can't detect what is loud and what is soft?
I noticed that regardless of how soft the reference voice is, the output is always loud. Are we really able to capture the style token if we can't detect what is loud and what is soft?