Seems like optimizer.step() has been overridden after learning rate scheduler initialization

mgrankin / ru_transformers

Apache License 2.0

776 stars 108 forks source link

Seems like optimizer.step() has been overridden after learning rate scheduler initialization #11

Closed piegu closed 4 years ago

piegu commented 4 years ago

Hi,

I'm using your run_lm_finetuning.py script. It works but I would like to know why in the starting of the training, I get:

3 times the following sentence: Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to ...
and the following warning:

/opt/anaconda3/envs/gpt/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:91: 
UserWarning: Seems like `optimizer.step()` has been overridden after learning rate scheduler 
initialization. Please, make sure to call `optimizer.step()` before `lr_scheduler.step()`. 
See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

It is strange as we can see that optimizer.step() is well called before scheduler.step() from line 359.

Any thought about that?

mgrankin commented 4 years ago

I also got similar warnings, it's ok.

https://github.com/NVIDIA/apex/issues/318#issuecomment-493797866 https://discuss.pytorch.org/t/cyclic-learning-rate-how-to-use/53796/2

piegu commented 4 years ago

[ EDIT ] WRONG solution. DO NOT follow this post.

Thanks for the links.

I made the following changes in your run_lm_finetuning.py and the warning is gone now:

I deleted the line 360 (scheduler.step())
I added the following code in the line 297 after the code model, optimizer = amp.initialize(model, optimizer, opt_level=args.fp16_opt_level):
```
if amp._amp_state.loss_scalers[0]._unskipped != 0: # assuming you are using a single optimizer
 scheduler.step()
```

mgrankin commented 4 years ago

You shouldn't delete the line with scheduler.step() inside the learning loop if you use LR scheduling.

piegu commented 4 years ago

Ok but if I remplace only scheduler.step() by the following code, I'm still getting the WARNING.

if amp._amp_state.loss_scalers[0]._unskipped != 0: # assuming you are using a single optimizer
     scheduler.step()

How to implement the following comment?

To avoid this warning, initialize the scheduler after running amp.initialize(model, optimizer, opt_level).

mgrankin commented 4 years ago

I'd appreciate to let me know if you find out the proper to get rid of the warning.

piegu commented 4 years ago

I'd appreciate to let me know if you find out the proper to get rid of the warning.

In fact, when I see a warning, I try to find the reason.

Maybe I could forget this warning (ie, your code works well) but I guess you agree with me that there is a reason behind this warning.

Note: I am asking you many questions because I am training GPT-2 in a language other than English and Russian (Portuguese). By the way, I have questions ;-) about the training process.