Thank you for your outstanding work.
I tried to train the model with the default hyperparameters except batchsize=3 on 3*rtx4090, but I found that the loss values corrupted to NaN at an early stage (about 4000 iters). I would like to have some advise on training skills. Thanks!
Thank you for your outstanding work. I tried to train the model with the default hyperparameters except batchsize=3 on 3*rtx4090, but I found that the loss values corrupted to NaN at an early stage (about 4000 iters). I would like to have some advise on training skills. Thanks!