rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
6.68k stars 489 forks source link

Is it possible to finetune a medium quality checkpoint to create a high quality checkpoint? #647

Open yilmazay74 opened 5 days ago

yilmazay74 commented 5 days ago

Hi, So far I have created some medium quality ones using the Turkish sample checkpoints provided on 'download voices' page. There is a nice example with was trained until appr. 5600 steps and when I fine tune it with about 300 our own samples until 10k steps it gives not bad results. However, we want to have much better quality. So I wanted to try high quality. When I tried to fine tune sample medium quality voice it throws a lot of mismatch errors. So I thought it looks like it is not possible to fine tune from a different quality level. But I am not sure. Currently I am trying to finetune an English high quality checkpoint at 2000 steps with my own Turkish samples of about 300 files. (since there is no high quality checkpoint sample for Turkish language, I thought english would be the best choice to start from somewhere) However, I am not very optimistic about it. Could anyone quide me what is the best way for me to create better quality tts models? Thanks in advance.