About Different Size between Predicted-mel and Preprocess-mel

Hi，I am trying combine “deepvoice3-pytorch” with "wavenet_vocoder" ,which is both from your work, and I really thanks about that.

And , I extract the mel-output from deepvoice3-pytroch in synthesis.py of it on line 108 ：

with torch.no_grad(): mel_outputs, linear_outputs, alignments, done = model( sequence, text_positions=text_positions, speaker_ids=speaker_ids) linear_output = linear_outputs[0].cpu().data.numpy() spectrogram = audio._denormalize(linear_output) alignment = alignments[0].cpu().data.numpy() mel = mel_outputs[0].cpu().data.numpy() mel = audio._denormalize(mel)

I save the mel output in .npy file and try to use it in wavenet_vocoder. But I meet the size-mismatch , while the preporcess-mel is (X,80) and can be used for synthesis to wave, but the predicted-mel from deepvoice3 is (80,X) and has size-mismatch error.

Firstly I think it maybe the transform problems so I change the predicted-mel(80,X) to .T(X,80) , but It didn't work too.

Could you please tell me why this happens? And how to modify the size of predicted-mel from deepvoice3 so that it match the input of wavenet?

I'm really want to know about it. Thanks.

r9y9 / wavenet_vocoder

About Different Size between Predicted-mel and Preprocess-mel #205