seungwonpark / melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)
http://swpark.me/melgan/
BSD 3-Clause "New" or "Revised" License
637 stars 116 forks source link

Wrong implementation of Generator #22

Closed seungwonpark closed 4 years ago

seungwonpark commented 5 years ago

The last layer should be:

nn.utils.weight_norm(nn.Conv1d(32, 1, kernel_size=7, stride=1, padding=3)),

not:

nn.utils.weight_norm(nn.ConvTranspose1d(32, 1, kernel_size=7, stride=1, padding=3)),

omg...

seungwonpark commented 5 years ago

Working on this at fix/22 branch.

I'll be training the model from scratch and upload audio samples again. I wonder why the previous model was working well.

bob80333 commented 5 years ago

As far as why it worked so well, this article trains a CNN image classifier with Conv2d and ConvTranspose2d using the same hyperparameters for both and gets almost identical results.

seungwonpark commented 5 years ago

The new version looks working well. I will merge it to master branch after the pre-trained model is ready. Also, will update the audio samples.