Closed ayush0x00 closed 1 year ago
Got resolved. Closing the issue.
Can you please explain what was happening to help people who may meet this issue in the future?
I accidentally modified the original code PReLU layer to use a torch.Float16 datatype which was causing the issue. The original code didn't have any issues.
I was trying to train AdaFace on a custom dataset with 10k classes. When the model started to train, I got a ValueError(Attempting to unscale FP16 gradients). It's obvious that FP16 gradients can't be scaled and unscaling/scaling is handled internally by AMP but I am not able to find the root cause of the error. I have also attached a screenshot of the same.