Hi
Thanks for this great implementation.
I trained Tacotron and used WaveRNN as vocoder and result was good. Now I want to use HiFi-GAN as vocoder so I cloned this project and run it with mel-spectrogram reached from Tacotron. But result was very noisy! You can listen to result here and some prints are as follow:
x = synthesizer.synthesize_spectrograms(texts, embeds)
print(x)
with torch.no_grad():
x = torch.from_numpy(x).to(device)
x = x.unsqueeze(0)
print(x.shape)
y_g_hat = generator(x)
audio = y_g_hat.squeeze()
audio = audio.cpu().numpy()
Hi Thanks for this great implementation. I trained Tacotron and used WaveRNN as vocoder and result was good. Now I want to use HiFi-GAN as vocoder so I cloned this project and run it with mel-spectrogram reached from Tacotron. But result was very noisy! You can listen to result here and some prints are as follow:
I got wav result of WaveRNN for this mel-spectrogram and run HiFi-GAN with this wav and result was great. Some print are as follow:
So HiFi-GAN work well but I think when using mel-spectrogram directly we should change some thing (like hop_size?) but what and how?