Open jiayily opened 3 years ago
In the paper you said "The model is trained using SGD for 600 epochs with a batch size of 8, momentum of 0.9 and learning rate of 10−7". Why we need to set learning rate so small?
In the paper you said "The model is trained using SGD for 600 epochs with a batch size of 8, momentum of 0.9 and learning rate of 10−7". Why we need to set learning rate so small?