integration of fastspeech with Squeezewave vocoder

alokprasad commented 4 years ago

Placeholder for issue related to integration of fastspeech with squeezewave https://github.com/tianrengao/SqueezeWave seems to quite faster than waveflow.

alokprasad commented 4 years ago

i tried saving the mel_postnet_torch( melspectrogram) to a pt file , then used to generate wav from Squeezewave but i get following error.

Traceback (most recent call last): File "inference.py", line 87, in args.sampling_rate, args.is_fp16, args.denoiser_strength) File "inference.py", line 57, in main audio = squeezewave.infer(mel, sigma=sigma).float() File "/mount/data/SqueezeWave/glow.py", line 261, in infer output = self.WN[k]((audio_0, spect)) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/mount/data/SqueezeWave/glow.py", line 165, in forward spect = self.cond_layer(spect) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, **kwargs) File "/home/alok/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward self.padding, self.dilation, self.groups) RuntimeError: Expected 3-dimensional input for 3-dimensional weight [2048, 80, 1], but got 4-dimensional input of size [1, 1, 80, 133] instead

Any idea was could be the issue?

alokprasad commented 4 years ago

saving the mel_postnet_torch produces output which is the input to squeezewave melspec = torch.squeeze(mel_postnet_torch, 0) torch.save(melspec, "/tmp/test.pt")

test.pt will be melspectrogram input to squeezewave.

alokprasad commented 4 years ago

@xcmyz

Following Text -->" Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition in being comparatively modern" Got generated astonishing fast in single core cpu ( no gpu)( have included model loading time)

Audio Duration generated 11.5 Sec in around 3.83 seconds

MEL Calculation: 2.827802896499634

Squeezewave vocoder time 1.0016820430755615

xcmyz / FastSpeech

integration of fastspeech with Squeezewave vocoder #59