ironjr / grokfast

Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"
https://arxiv.org/abs/2405.20233
MIT License
476 stars 39 forks source link

Bug Fix for Handling None Gradients #3

Closed majirky closed 2 months ago

majirky commented 2 months ago

While trying out your amazing idea, I noticed a bug when I initialized some parameters, but do not actually used them in training. The error AttributeError: 'NoneType' object has no attribute 'data' occurs because p.grad is None for these unused parameters. This happens since they are not involved in the training process.

I know it is a small contribution, so feel free to reject this pull request if it does not fit your requirements.

ironjr commented 2 months ago

Your contribution is very much appreciated!