linear learning rate scaling?

xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:

MIT License

7.18k stars 1.92k forks source link

linear learning rate scaling? #476

Open LaCandela opened 4 years ago

LaCandela commented 4 years ago

Hi! I have a question concerning the linear learning rate scaling that you are using. In the publication https://arxiv.org/abs/1706.02677 this scaling rule is only proven for SGD but you are using Adam. Did you do or do you know about any experiments that back up this approach?

xingyizhou commented 4 years ago

I searched the learning rate locally (2.5e-4, 6.125e-5 for batch size 32), the performance is within the random noise (<0.4 COCO AP).

LaCandela commented 4 years ago

OK, thank you for your answer! Did you experiment with different batch sizes and up/downscaled learning rates with Adam to see if the linear scaling rule is true?