Open loretoparisi opened 3 years ago
Thanks for your suggestion. It is supported now and indeed the audio quality is much better!
@ming024 super, let me try it out How I can choose it for the english voice? thanks
Hi, thanks for your efforts in putting this amazing repo together! With your latest changes, I get
FileNotFoundError: [Errno 2] No such file or directory: 'hifigan/config.json'
when running synthesize.py. Would you mind adding the hifigan config as well?
@loretoparisi In my experience vocoders are generally independent or weakly-dependent to languages. So feel free to try it.
@chrr I somehow forgot to upload the hifigan/
directory. It should be fixed now.
Hey @ming024 , I am working on Arabic which has a different script than English, will that affect the results ? Also, should I use the universal
hifigan model?
@zaidalyafeai I believe the universal HiFiGAN yields the best result for unknown speakers. I also think that there may not be a great performance drop of the pretrained vocoders for different languages, as long as the same preprocessing hyperparameters are used.
Thanks @ming024 , I tested both vocoders and indeed the universal is much better. Which preprocessing hyperparameters mostly affect the vocoders?
@zaidalyafeai the preprocessing parameters should match that of the pretrained vocoders, or there may be strange results.
@zaidalyafeai @ming024 Did you use the pre-trained one that already exists for the Universal vocoder? Or did you train it from scratch when you used it, for example, with Arabic data? I am trying now to add a pretrained VITS vocoder to the Fastspeech (using the same preprocessing hyperparameters). However, I only get the noisy voice generated. Thanks for your answer in advance!
@zaidalyafeai @ming024 Did you use the pre-trained one that already exists for the Universal vocoder? Or did you train it from scratch when you used it, for example, with Arabic data? I am trying now to add a pretrained VITS vocoder to the Fastspeech (using the same preprocessing hyperparameters). However, I only get the noisy voice generated. Thanks for your answer in advance!
No, you can not do that. The standard hifigan vocoder, is trained from mel spectrogram into wavform. so it can be used as vocoder to FastSpeech2. Your VITS decoder part 【have nearly the same structure with hifigan】 is trained to generate wavform from the VITS lattent variable "z", not mel-spectrogram. so they are COMPLETElY DIFFERENT.
HiFiGan has sota results in wav generation from mel spectrograms
Is it possibile to add support to
hifigan
model, after themel
generation, in order to create the wave file?or some additional adaptation would be needed?
In the case of the end-to-end inference with hifi gan the generation code would look like
where
mel_torch
is our mel spectrogram.