Closed saniazahan closed 3 years ago
Update: I removed autograd_anomally detection. Its training smoothly for now. Maybe training in half precision is the culprit as autograd detects the gradients overflow as error before GradScaler comes in action.
Update: I removed autograd_anomally detection. Its training smoothly for now. Maybe training in half precision is the culprit as autograd detects the gradients overflow as error before GradScaler comes in action.
Dude you really save my life
Thanks
Hi thank you so much for sharing your work. I am trying to recreate the results. I am using the ntu xsub dataset you provided with half precision amp level 1. But at epoch 33 I got NaN out from batchnorm layer. I originated from "out = tempconv(x)" function in ms_tcn.py file. I had autograd_anomally on. All the config settings are kept as your repo. Could you please suggest why this happened.