First of all, thanks for your contributions on VITS2!
I was wondering if you'd have any tips how to train a different (English) voice than the ljspeech one. I don't have access to very powerful GPUs, so it'd great if it were possible to do fine-tuning on top of the 64k steps checkpoint that you've already posted (again, thanks!).
Possibly something similar to https://github.com/nivibilla/efficient-vits-finetuning?
You could simply apply RVC over the generated audio. But if you want a different accent, that's another story.
I wish we had a Lora or Lora-like solution for this, though.
Hi.
First of all, thanks for your contributions on VITS2!
I was wondering if you'd have any tips how to train a different (English) voice than the ljspeech one. I don't have access to very powerful GPUs, so it'd great if it were possible to do fine-tuning on top of the 64k steps checkpoint that you've already posted (again, thanks!). Possibly something similar to https://github.com/nivibilla/efficient-vits-finetuning?