I encountered the following error while trying to train a model from LJSpeech following the steps in README.md:
Traceback (most recent call last):
File "train.py", line 493, in
main(args)
File "train.py", line 433, in main
scheduler = AnnealLR(optimizer, warmup_steps=c.warmup_steps, last_epoch=args.restore_step)
File "/workspace/TTS/utils/generic_utils.py", line 148, in init
super(AnnealLR, self).init(optimizer, last_epoch)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 20, in init
"in param_groups[{}] when resuming an optimizer".format(i))
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
After digging around a bit, it looks like the problem is with the 'last_epoch=args.restore_step' argument to AnnealLR() call. This argument is set in train.py to zero when not using a checkpoint on line 425:
args.restore_step = 0
However, the lr_scheduler.py module expects "-1" for the initial epoch. I changed zero to -1 in line 425
args.restore_step = -1
and the training from scratch seems to be working now.
Environment:
Python 3.6 PyToch 0.4.1 Cuda 9.1
I encountered the following error while trying to train a model from LJSpeech following the steps in README.md:
Traceback (most recent call last): File "train.py", line 493, in
main(args)
File "train.py", line 433, in main
scheduler = AnnealLR(optimizer, warmup_steps=c.warmup_steps, last_epoch=args.restore_step)
File "/workspace/TTS/utils/generic_utils.py", line 148, in init
super(AnnealLR, self).init(optimizer, last_epoch)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 20, in init
"in param_groups[{}] when resuming an optimizer".format(i))
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"
After digging around a bit, it looks like the problem is with the 'last_epoch=args.restore_step' argument to AnnealLR() call. This argument is set in train.py to zero when not using a checkpoint on line 425:
args.restore_step = 0
However, the lr_scheduler.py module expects "-1" for the initial epoch. I changed zero to -1 in line 425
args.restore_step = -1
and the training from scratch seems to be working now.