Closed Souvic closed 1 year ago
Hi,
If you want to change the vocoder settings, it is necessary to retrain. It can be better to increase model sizes if you have enough data per speaker. Generally, poor results are yielded by when the silence is not removed.
Hi, Thanks for the great work! I have trained the model on custom datasets with ten times more speakers now. However, the results are only marginally better. For, self-regeneration, it works well but fails for conversion cases to achieve good performance. Is it about the speaker encoder? Should I increase the dimensions to capture more variety? If the self-regeneration is good, then, possibly the vocoder part works well, right? No need to change the vocoder, right? Even if we want to change the vocoder and want to shift to NVIDIA BIgVGAN, should I retrain the net with BigVGAN's params?