Open DavideHe opened 1 week ago
Hi @DavideHe, thanks for raising the issue. could you share a minimal reproducer ? The lr_scheduler
should behave the same when if the lr_scheduler
is in the 2nd prepare. However, we expect the user to only use prepare once. What is the behavior you were expecting ? With accelerator.accumulate(model), the lr_scheduler
is should be updated after every gradient_accumulation_steps iteration. See related issue https://github.com/huggingface/accelerate/issues/963
prepare twice
model, optimizer, train_dataloader,eval_dataloader= accelerator.prepare(
model, optimizer, train_dataloader,eval_dataloader)
lr_scheduler= accelerator.prepare( lr_scheduler)
for data in train_dataloader:
with accelerator.accumulate(model):
lr_scheduler.step()
print(lr_scheduler.get_last_lr()[-1])
as the code above, the lr
will update every step when gradient_accumulation_steps > 1
.
But prepare once , lr
will update every gradient_accumulation_steps
step.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
I write the code with accelerator.prepare more than once:
lr_scheduler.step()
running is different withprepare
once . with once,with accelerator.accumulate(model):
,thelr_scheduler.step()
will runnum_processes
times everystep
,see code. with twice prepare andlr_scheduler
prepare after.with accelerator.accumulate(model):
,thelr_scheduler.step()
will run once everystep
.with 2nd prepare with
lr_scheduler
,Is there some difference withwith accelerator.accumulate(model):
?Expected behavior
The reasons for differences in coding type.