issues
search
ironjr
/
grokfast
Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"
https://arxiv.org/abs/2405.20233
MIT License
342
stars
26
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
AdamW better than grokfast + Adam?
#12
Zhi0467
closed
1 week ago
2
Is this specific to transformers?
#11
phalexo
closed
1 week ago
2
How to use Grokfast with FP16 mixed precision training?
#10
peterjc123
opened
2 weeks ago
2
Exploding Gradients
#9
DustinEwan
closed
1 week ago
1
Feature/kalman filter
#8
khari998
opened
3 weeks ago
5
Help to find hyper parameters for LLama 2
#7
50Bytes-dev
opened
3 weeks ago
0
Was trying to stick this code into Trainer's inner training loop.
#6
phalexo
closed
3 weeks ago
0
mps compatibility
#5
d0rc
closed
3 weeks ago
1
Any experiments / gotchas to be aware of when using schedulefree optimizer?
#4
dawood95
opened
4 weeks ago
0
Bug Fix for Handling None Gradients
#3
majirky
closed
1 month ago
1
Anyone already working on including this in transformers?
#2
l4b4r4b4b4
opened
1 month ago
4
Choosing weight decay?
#1
TKassis
closed
1 month ago
3