lucidrains / grokfast-pytorch

Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
MIT License
83 stars 4 forks source link