Open Rajat-Mehta opened 4 years ago
Hi @Rajat-Mehta , I follow the traditional ResNet training method in the original paper, which uses SGD optimizer with momentum.
You also could try Adam, and carefully set the learning rate (for example, 3e-4). It would work.
Hi,
I wanted to know is there any specific reason that you are using SGD with momentum optimizer instead of more recent variants like Adam and AdaGrad?
How will the model perform if I use Adam? Would you suggest doing so?
Thanks