sail-sg / Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Apache License 2.0
746 stars 63 forks source link

Adan相比于SGD在前 74 epochs保持领先,但是后续收敛变慢,我改如何调整lr等超参数? #39

Closed liiicon closed 9 months ago

liiicon commented 1 year ago

optimizer = Adan(pg0,lr = 1e-3, betas=(0.98, 0.92, 0.99), eps=1.0e-08, weight_decay=0.02), image Black is SGD,pink is Adan.

XingyuXie commented 1 year ago

请问sgd的lr是多少? 其次,可以设置beta3为0.999,以及增大lr以及warmup steps进行尝试。

liiicon commented 1 year ago

sgd的lr是0.01。我会尝试您的建议,十分感谢您的帮助!