Open Fadelis98 opened 1 month ago
step_was_skipped
is only done if we hit overflow, not if we are accumulating gradients. We can rename this to be overflow_hit
potentially, but that is the true intention of it, not if we are not syncing the gradients yet
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
The value of
optimizer.step_was_skipped
, as it is named, should beTrue
wheneveroptimizer.step()
is called but the step is not actually applied to the parameters. The logic is implemented at here, which is inside theif self.gradient_state.sync_gradients
condition. The standard implemention of gradient accumulation controls whather to actually step the optimizer by this condition, sooptimizer.step_was_skipped
would be alwaysFalse
Expected behavior
If this is an expected behaviour, rename
optimizer.step_was_skipped
or note this behaviour in the doc string. Otherwise, fix its logic to returnTrue
when the gradient is accumulated.