Reformer pre-training gradient accumulation

lucidrains / reformer-pytorch

Reformer, the efficient Transformer, in Pytorch

MIT License

2.1k stars 254 forks source link

Reformer pre-training gradient accumulation #97

Closed apoorv2904 closed 4 years ago

apoorv2904 commented 4 years ago

Hi,

I was looking at the reformer pretraining script and noticed that the optimizer.step() is called at every step and not after every gradient_accumulate steps. Is this a bug or am I missing out something?

lucidrains commented 4 years ago

@apoorv2904 that was actually pull-requested in some time ago, early on with the repository was still in flux. feel free to send over a fix!

lucidrains commented 4 years ago

@apoorv2904 i put in a quick fix here https://github.com/lucidrains/reformer-pytorch/commit/42a8682ff8e7cec3122eff6febc9087f1c53f370 please reopen if it doesn't work!