Closed fatihkiralioglu closed 4 years ago
It is better you to check attention plots using benchmark notebook. Without visuals it is hard to guess where the problem is.
if the problem is number specific, you can also expand numbers to their textual forms before TTS.
Hi, I have trained an English Tacotron model with a custom database and the synthesis quality is quite good. The only issue is that for some text inputs, the inference process outputs just noise. For example ,
mozilla_test.zip
I have have shared two sample texts and their corresponding synthesis results:
"the number is 345670" -> synthesis is good
"345670" -> noise
It seems there is no problem with phoneme expansion and phoneme to sequence conversion.
Thanks.