ironjr grokfast issues - Githubissues

ironjr / grokfast

Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"

https://arxiv.org/abs/2405.20233

MIT License

342 stars 26 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

AdamW better than grokfast + Adam?

#12 Zhi0467 closed 1 week ago
2
Is this specific to transformers?

#11 phalexo closed 1 week ago
2
How to use Grokfast with FP16 mixed precision training?

#10 peterjc123 opened 2 weeks ago
2
Exploding Gradients

#9 DustinEwan closed 1 week ago
1
Feature/kalman filter

#8 khari998 opened 3 weeks ago
5
Help to find hyper parameters for LLama 2

#7 50Bytes-dev opened 3 weeks ago
0
Was trying to stick this code into Trainer's inner training loop.

#6 phalexo closed 3 weeks ago
0
mps compatibility

#5 d0rc closed 3 weeks ago
1
Any experiments / gotchas to be aware of when using schedulefree optimizer?

#4 dawood95 opened 4 weeks ago
0
Bug Fix for Handling None Gradients

#3 majirky closed 1 month ago
1
Anyone already working on including this in transformers?

#2 l4b4r4b4b4 opened 1 month ago
4
Choosing weight decay?

#1 TKassis closed 1 month ago
3