Fastspeech project ( https://github.com/xcmyz/FastSpeech) generates mel spectrogram quite fast
from text, i am trying to integrate fastspeech mel generation with squeezewave vocoder instead of using mel2samp.py to generates mels...pt.
but getting
i tried saving the mel_postnet_torch( melspectrogram) to a pt file , then used to generate wav
from Squeezewave but i get following error.
Traceback (most recent call last):
File "inference.py", line 87, in
args.sampling_rate, args.is_fp16, args.denoiser_strength)
File "inference.py", line 57, in main
audio = squeezewave.infer(mel, sigma=sigma).float()
File "/mount/data/SqueezeWave/glow.py", line 261, in infer
output = self.WN[k]((audio_0, spect))
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, *kwargs)
File "/mount/data/SqueezeWave/glow.py", line 165, in forward
spect = self.cond_layer(spect)
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(input, **kwargs)
File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 3-dimensional input for 3-dimensional weight [2048, 80, 1], but got 4-dimensional input of size [1, 1, 80, 133] instead
Fastspeech project ( https://github.com/xcmyz/FastSpeech) generates mel spectrogram quite fast from text, i am trying to integrate fastspeech mel generation with squeezewave vocoder instead of using mel2samp.py to generates mels...pt.
but getting
i tried saving the mel_postnet_torch( melspectrogram) to a pt file , then used to generate wav from Squeezewave but i get following error.
Traceback (most recent call last): File "inference.py", line 87, in args.sampling_rate, args.is_fp16, args.denoiser_strength) File "inference.py", line 57, in main audio = squeezewave.infer(mel, sigma=sigma).float() File "/mount/data/SqueezeWave/glow.py", line 261, in infer output = self.WN[k]((audio_0, spect)) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/mount/data/SqueezeWave/glow.py", line 165, in forward spect = self.cond_layer(spect) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, **kwargs) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward self.padding, self.dilation, self.groups) RuntimeError: Expected 3-dimensional input for 3-dimensional weight [2048, 80, 1], but got 4-dimensional input of size [1, 1, 80, 133] instead
Any idea was could be the issue?
I added lines to save mel calculation at after https://github.com/xcmyz/FastSpeech/blob/master/synthesis.py#L66 torch.save(mel_postnet_torch,"filename.pt")