Closed lyndonlauder closed 4 months ago
I have lost my log. if I have enough GPUs, I will retrain it. You might need to lower the initial learning rate to 0.0003.
I have lost my log. if I have enough GPUs, I will retrain it. You might need to lower the initial learning rate to 0.0003.
Thank you, do I need to do anything different to track the grad with mixed precision? The number sometimes shows inf or big numbers like 29771, this doesnt happen with fp32.
It is normal for the gradient values to be large. I have encountered inf values before, but as long as the loss is normal, there is no issue.
Hello, I am training the large SCNet model with mixed precision. The grad is very high and sometimes inf, I also received nan loss at epoch 21. Do you have any advice for tracking/improving the stability of the model with mixed precision? Please can you share your training log for large model?
Here is my training log