Fix gradient accumulation with small dataset sizes

timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

https://arxiv.org/abs/2001.07676

Apache License 2.0

1.62k stars 283 forks source link

Fix gradient accumulation with small dataset sizes #25

Closed nelson-liu closed 3 years ago

nelson-liu commented 3 years ago

Right now, step is reset across epochs. This is problematic if, say, you're training on a dataset with 8 examples but have a per-gpu batch size of 8 and want to do gradient accumulation for 4 steps. You'll never reach step = 4, and then never do gradient updates.

this fix just persists step across epochs

timoschick commented 3 years ago

Well spotted, thanks!