Hi Aaron, I'm trying to train the small model using the default parameters you provided, with a change of training steps to 400k. However, I got loss nan after about 5000 steps. Any advice would be appreciated.
I am also trying to reproduce the code, but always get errors. Seeing that you have successfully reproduced it, can you provide some suggestions or help? Thank you
Hi Aaron, I'm trying to train the small model using the default parameters you provided, with a change of training steps to 400k. However, I got loss nan after about 5000 steps. Any advice would be appreciated.