seungwonpark / melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)
http://swpark.me/melgan/
BSD 3-Clause "New" or "Revised" License
633 stars 116 forks source link

Unable to resume training from official checkpoint #42

Closed delip closed 4 years ago

delip commented 4 years ago

Hi @seungwonpark

Thanks for all this. I am using your official checkpoint nvidia_tacotron2_LJ11_epoch6400.pt

When I try to resume training from that checkpoint I get the following error:

2020-01-07 00:26:35,386 - INFO - Resuming from checkpoint: ./nvidia_tacotron2_LJ11_epoch6400.pt
Traceback (most recent call last):
  File "trainer.py", line 52, in <module>
    train(args, pt_dir, args.checkpoint_path, trainloader, valloader, writer, logger, hp, hp_str)
  File "/delip/workspace/melgan/utils/train.py", line 34, in train
    model_d.load_state_dict(checkpoint['model_d'])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 816, in load_state_dict
    state_dict = state_dict.copy()
AttributeError: 'NoneType' object has no attribute 'copy'

Looks like a mismatch between my PyTorch version and the one used for this checkpoint? My PyTorch version is 1.3.0a0+24ae9b5.

Do you have a checkpoint for the latest PyTorch? Or alternatively, what was the PyTorch version used with this checkpoint?

PS: md5sum for my copy of the checkpoint is 1cb89dc08401770fa9e2dd7d5c704bf5

seungwonpark commented 4 years ago

Hi, @delip. Thanks for your interest.

I'm sorry to inform that this issue is caused because there are no state_dict for the discriminator in the checkpoint. Due to our company's proprietary-related issue, we decided not to include discriminator's state_dict in pre-trained checkpoint. So it was replaced with None before being published in GitHub.