Closed priyammaz closed 1 week ago
I have been logging the learning rate on wandb and it looks like this (training for 90 epochs and multiplying learning rate by 0.1 every 30 epochs). But as you can see, I was training this model on 2 GPUs, so the scheduler is multipling the learnign rate by 0.1 every 15 epochs instead (so going twice as fast)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
I am experimenting with different schedulers and noticed a small problem Here is the skeleton of the training script, nothing fancy:
Expected behavior
What i want is basically, over the 100 epochs i will train the model, the first 4 epochs should be a warmup and then ever 20 epochs after that the learning rate will reduce by a factor of 0.1. This works totally fine on a single GPU, but then when doing two GPUs, it goes through the scheduler twice as fast as if the scheduler.step() is being called twice. Should i wrap the scheduler.step() to only occur on the main gpu using
if accelerator.is_local_main_process()
, or multiple everythign by the number of GPUs, or is there a better way to do this that I am missing?