Error with 'initial_lr' parameter

spakhomov commented 5 years ago

Environment:

Python 3.6 PyToch 0.4.1 Cuda 9.1

I encountered the following error while trying to train a model from LJSpeech following the steps in README.md:

Traceback (most recent call last): File "train.py", line 493, in main(args) File "train.py", line 433, in main scheduler = AnnealLR(optimizer, warmup_steps=c.warmup_steps, last_epoch=args.restore_step) File "/workspace/TTS/utils/generic_utils.py", line 148, in init super(AnnealLR, self).init(optimizer, last_epoch) File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 20, in init "in param_groups[{}] when resuming an optimizer".format(i)) KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

After digging around a bit, it looks like the problem is with the 'last_epoch=args.restore_step' argument to AnnealLR() call. This argument is set in train.py to zero when not using a checkpoint on line 425:

args.restore_step = 0

However, the lr_scheduler.py module expects "-1" for the initial epoch. I changed zero to -1 in line 425

args.restore_step = -1

and the training from scratch seems to be working now.

erogol commented 5 years ago

Yeah I thought I fixed it on master but obvisouly not. Good catch. I'll fix asap.

erogol commented 5 years ago

Updated the master, it should work fine now.

mozilla / TTS

Error with 'initial_lr' parameter #57