Open nikigre opened 5 years ago
Show the attention plots, please. And remember that for unseen labels you have to reach at least 300k.
Hi! Here are the last 4 images and sounds. Plot.zip I am synthesizing Slovenian language. Yes, but I made up a sentence that is build of the same words I know I have recorded. But I almost always get unrecognisable sound. Thank you!
Your charts show it has not yet aligned.
Hi! How many recordings (hours) is minimal for acceptable results? In which language are you working on @el-tocino ? Thank you
I use English. I started with about 1000 recordings, which didn't work well, have moved up to 6000.
Hi @el-tocino! How long are these recordings in sum (minutes/hours)?
Average clip length was 3.3s. shortest was .5s, longest 9.6s. Now up to several hours, not sure exactly without counting but at least 6.
Thank you @el-tocino! How about hparams.py
? Did you change anything?
Yep, adjusted outputs per step and batch size to fit my gpu. Sample rate to fit my clips (16000). Learning rate I adjusted depending on how many samples I had.
Hi guys! So after a break, I decided to try again. Now I have made new recordings that should be good. I used LJSpeech dataset as an example. I have 1,9 hours of recordings. More are in creation. I have run the training process and currently, I am at step37150 and graph is still empty. What am I doing wrong? I have no idea. I am a bit desperate here. step-37150-audio.wav sounds a bit robotic, but it is understandable. But the demo server does not synthesise anything. Here are my hparams.txt
Thank you for your help!
Hi! Does anyone have any suggestions? Thank you!
Hi! @el-tocino do you have any suggestions?
You're not aligning. Probably bad data. Look up nmstoker's data plotter tool on the mozilla tts repo to see how your dataset maps out, maybe try that repo instead as well.
Hi! I have jow upgraded to a much powerful computer. And currently, I am at step 23300. Waws at checkpoints are very good and easy to understand. But if I put almost anything into demo server it is not understandable. And also, most Wav files are 10:25. Why is that? And also, if I put the same sentence I recorded it works, but in the end, it adds weird noise. Thank you for help!