tianrengao / SqueezeWave

Other
255 stars 50 forks source link

Using Spectogram generated from Fastspeech to Squeezewave #5

Open alokprasad opened 4 years ago

alokprasad commented 4 years ago

Fastspeech project ( https://github.com/xcmyz/FastSpeech) generates mel spectrogram quite fast from text, i am trying to integrate fastspeech mel generation with squeezewave vocoder instead of using mel2samp.py to generates mels...pt.

but getting

i tried saving the mel_postnet_torch( melspectrogram) to a pt file , then used to generate wav from Squeezewave but i get following error.

Traceback (most recent call last): File "inference.py", line 87, in args.sampling_rate, args.is_fp16, args.denoiser_strength) File "inference.py", line 57, in main audio = squeezewave.infer(mel, sigma=sigma).float() File "/mount/data/SqueezeWave/glow.py", line 261, in infer output = self.WN[k]((audio_0, spect)) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/mount/data/SqueezeWave/glow.py", line 165, in forward spect = self.cond_layer(spect) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, **kwargs) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward self.padding, self.dilation, self.groups) RuntimeError: Expected 3-dimensional input for 3-dimensional weight [2048, 80, 1], but got 4-dimensional input of size [1, 1, 80, 133] instead

Any idea was could be the issue?

I added lines to save mel calculation at after https://github.com/xcmyz/FastSpeech/blob/master/synthesis.py#L66 torch.save(mel_postnet_torch,"filename.pt")