jik876 / hifi-gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
MIT License
1.92k stars 506 forks source link

Swap axes when running inference_e2e.py with TransformerTTS #110

Open alexvwegen opened 2 years ago

alexvwegen commented 2 years ago

Hej!

I was trying to run end-to-end inference with mels I generated with as-ideas/TransformerTTS and run into the following error:

RuntimeError: Given groups=1, weight of size 256 80 7, expected input[1, 197, 80] to have 80 channels, but got 197 channels instead

I could quickly fix it with adding this after line 49 in inference_e2e.py:

#line 48 ...
#line 49 x = torch.FloatTensor(x).to(device)
if not x.shape[0] == 80:
   print("shape mismatch. swapping axes.")
   x = torch.transpose(x, 0, 1)
#line 50 y_g_hat = generator(x)
#line 51 ...

This works fine for me, but I wonder if this is a good way to deal with the issue and if anyone else did experience this when working with mels from TransformerTTS? If there is a better or cleaner solution, I'd be happy to know.