leaderj1001 / Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch
MIT License
456 stars 83 forks source link

Loss is NaN #5

Open phongnhhn92 opened 4 years ago

phongnhhn92 commented 4 years ago

Hello, I am testing your Resnet50 model with stem is True and at the first training step, my loss is NaN and the accuracy is decreasing? Is that a bug? image

Also I didn't see this problem when I train the model ResNet 26.

leaderj1001 commented 4 years ago

Thanks for your comments. I don't have enough GPUs. So, I couldn't experiments all of ResNet model. Maybe, you can reduce learning_rate. example) 0.01

Thank you !

ksouvik52 commented 4 years ago

Hi, I am facing issues with the Resnet50 model training on CIFAR-10. Even with lr of 0.01 it's throwing Nan after around 10 epochs (suddenly), so, I am not quite sure how to train the resnet50 model. Hoping for a quick reply! Thanks.

Just as a note, the resnet38 and 26 did run successfully without Nan.