timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 283 forks source link

Missing optimizer step #36

Open FlorisFok opened 3 years ago

FlorisFok commented 3 years ago

If max_steps or data length is not divisible by gradient_accumulation_steps some gradients are lost. Since updating only takes place at if (step + 1) % gradient_accumulation_steps == 0:

timoschick commented 3 years ago

Hi @FlorisFok, do you have suggestions as to how this should be fixed?

FlorisFok commented 3 years ago

Hi @timoschick, by adding an OR statement to the gradient accumulation if statement. This OR statement could also execute when the loop reaches the final batch.

last_batch = len(train_dataloader) - 1

The modify the following: if (step + 1) % gradient_accumulation_steps == 0 or last_batch == b_nr:

Where b_nr (batch_number) can be extracted from the first argument coming from the enumerate function. Theoretically, this should use the step variable already in the script, but this behaves exactly the same as the global_step. I think that's also a mistake, but that depends on the definition of the two.