Open brockbrownwork opened 6 days ago
@brockbrownwork it could be because i normalized the learning rate in this repository, for a fair comparison https://github.com/lucidrains/grokfast-pytorch/blob/main/grokfast_pytorch/grokfast.py#L26
@brockbrownwork try setting normalize_lr
to False
and it should be equivalent, i hope
I'm seeing a pretty significant difference between the loss plots of this implementation and the official implementation here, this one has trouble converging in my use case (though I did not run it for very long). It may be possible that the math is different.