Closed dtfasteas closed 2 years ago
Do you mean transfer learning by fine-tuning a model on a small dataset? You can find a fine-tuning example that may be helpful: https://github.com/r9y9/ttslearn/blob/59c6f491ce205cab611e171054af43afcc6ca603/extra_recipes/commonvoice/multispk_tacotron2_pwg_20spks/run.sh#L126-L133.
I tried the fine tuning parameter already but I got no good results so far.
I used jsut as a base corpus for testing, then add another 1000 files on top of that.
Before the fine tuning the tts could speak very well and clearly, but after finetuning it seemed like it forgot about the base.
For example: 今日はいい天気ですか。was pronounced very well just with the jsut base. But after fine tuning its like it forgot how to speak half of the words.
So I was wondering if there are any other advice on this besides just turning on finetuning parameter.
The additional 1000 files are not very good quality and with different emotions. But I expected at least when I turn on finetuning that it could speak the sentences in some way as it did before finetuning.
I see. I would recommend you to try good quality data first. Machine learning models highly depend on the data. I would also recommend clearning your data as possible as you can; e.g., removing trailing and leading noise.
Thank you very much, i try my best.
hello,
I went trough all the recipes and was looking for how transfer learning could be done in a similar way as described here
transfer learning
without having to rely upon that similar (parallel) wav/text is used as an input like in the multispeaker example.
thank you very much for an answer in advance.
best regards