Closed ProjectLSD closed 3 years ago
it is hard to tell if trembling voice is about alignment. Your plots look good to me. You can maybe reduce the learning rate a bit sinxe I see a couple of jumps here and there.
@erogol Thank you for your quick response. I'll lower the learning rate and try again.
I'd also check if any of these databases have recordings with significant amount of silence/pauses in them. Tacotron and attention based models in general are not very robust to those. Especially, long pauses between words.
@oytunturk Thank you for your reply. Let me check my dataset.
Feel free to open again if needed.
Hello, thanks to your work, I was able to make the desired voice. However, there is a problem that the "avg_align_error" figure does not fall below a certain number. Adding data did not lower "avg_align_error".
Here is my config.json and tensorbord result.
My datasets is [korean_all (12h50m), google (14h15m) , zeroth_f (6h16m) ,corpus_226_227_228 (8h48m)]
and, this is the audio file generated. test_audio.zip You hear a trembling sound when you listen to the audio file, it's related to avg_align_error?
How can I lower the "avg_align_error"?