winddori2002 / TriAAN-VC

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
MIT License
129 stars 12 forks source link

Other vocoder or any possible improvement #10

Closed Souvic closed 1 year ago

Souvic commented 1 year ago

Hi, Thanks for the great work! I have trained the model on custom datasets with ten times more speakers now. However, the results are only marginally better. For, self-regeneration, it works well but fails for conversion cases to achieve good performance. Is it about the speaker encoder? Should I increase the dimensions to capture more variety? If the self-regeneration is good, then, possibly the vocoder part works well, right? No need to change the vocoder, right? Even if we want to change the vocoder and want to shift to NVIDIA BIgVGAN, should I retrain the net with BigVGAN's params?

winddori2002 commented 1 year ago

Hi,

If you want to change the vocoder settings, it is necessary to retrain. It can be better to increase model sizes if you have enough data per speaker. Generally, poor results are yielded by when the silence is not removed.