tom-roddick / oft

MIT License
200 stars 50 forks source link

learning rate #18

Open jiayily opened 2 years ago

jiayily commented 2 years ago

In the paper you said "The model is trained using SGD for 600 epochs with a batch size of 8, momentum of 0.9 and learning rate of 10−7". Why we need to set learning rate so small?