Missing optimizer step - Githubissues

FlorisFok commented 3 years ago

If max_steps or data length is not divisible by gradient_accumulation_steps some gradients are lost. Since updating only takes place at if (step + 1) % gradient_accumulation_steps == 0:

timoschick commented 3 years ago

Hi @FlorisFok, do you have suggestions as to how this should be fixed?

FlorisFok commented 3 years ago

Hi @timoschick, by adding an OR statement to the gradient accumulation if statement. This OR statement could also execute when the loop reaches the final batch.

last_batch = len(train_dataloader) - 1

The modify the following: if (step + 1) % gradient_accumulation_steps == 0 or last_batch == b_nr:

Where b_nr (batch_number) can be extracted from the first argument coming from the enumerate function. Theoretically, this should use the step variable already in the script, but this behaves exactly the same as the global_step. I think that's also a mistake, but that depends on the definition of the two.

timoschick / pet

Missing optimizer step #36