winddori2002 / TriAAN-VC

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
MIT License
129 stars 12 forks source link

Custom Traning #2

Closed carlosedubarreto closed 1 year ago

carlosedubarreto commented 1 year ago

I testes with the provided models, and it worked well, but the voice from the reference was as good as when we use the samples that comes with the code.

I guess the samples are from the VCTK dataset where the model provided was trained.

So I think that to have a better custom voice, where it is more similar to the reference voice, we would need to train on the reference voice, is that correct?

And if it is, what do you suggest to use as a minimum to train on the reference data to have a better result?

Thanks a lot, and you work is amazing!

winddori2002 commented 1 year ago

Hi,

As you said, it will achieve better performance if you train the reference voices or some data similar to the references.

Actually, I have not tried a "fine-tuning" strategy for fast adaptation, so it is difficult to suggest some guides for the strategy.

However, I think it requires fewer data for fine-tuning.

In addition, the results are affected by the vocoder a lot. Since the vocoder is pre-trained on the VCTK dataset, it will be better if you use a general version of the vocoder or re-train the vocoder.

Thanks.

carlosedubarreto commented 1 year ago

Thanks a lot for the info, I had no idea about the vocoder. 🦾