leaderj1001 / Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch
MIT License
456 stars 83 forks source link

Can anyone train resnet50 successfully without NaN #22

Open ksouvik52 opened 4 years ago

ksouvik52 commented 4 years ago

Hi, I am facing issues with the Resnet50 model training on CIFAR-10. Even with lr of 0.01 its throwing Nan after around 10 epochs (suddenly), so, I am not quite sure how to train the resnet50 model. Hoping for a quick reply! Thanks.

sammens commented 4 years ago

I am also having the same issue. Did you solve it yet?

theFoxofSky commented 3 years ago

add BN for generated Q, K, V

danielmimimi commented 5 months ago

Can you elaborate @theFoxofSky ?