Other vocoder or any possible improvement

winddori2002 / TriAAN-VC

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

MIT License

129 stars 12 forks source link

Hi, Thanks for the great work! I have trained the model on custom datasets with ten times more speakers now. However, the results are only marginally better. For, self-regeneration, it works well but fails for conversion cases to achieve good performance. Is it about the speaker encoder? Should I increase the dimensions to capture more variety? If the self-regeneration is good, then, possibly the vocoder part works well, right? No need to change the vocoder, right? Even if we want to change the vocoder and want to shift to NVIDIA BIgVGAN, should I retrain the net with BigVGAN's params?

winddori2002 / TriAAN-VC

Other vocoder or any possible improvement #10