Motivation behind restart strategy during training?

xinntao / EDVR

Winning Solution in NTIRE19 Challenges on Video Restoration and Enhancement (CVPR19 Workshops) - Video Restoration with Enhanced Deformable Convolutional Networks. EDVR has been merged into BasicSR and this repo is a mirror of BasicSR.

https://github.com/xinntao/BasicSR

1.48k stars 320 forks source link

Motivation behind restart strategy during training? #92

Closed TouqeerAhmad closed 4 years ago

TouqeerAhmad commented 4 years ago

Hi,

Looking at the provided log files, the learning rate starts from 4e-4 and goes to 1e-7 in 150k iterations and then restarts from 4e-4. Can you please provide the reasoning for this learning rate strategy or a reference may be?

Thanks, Touqeer

TouqeerAhmad commented 4 years ago

Wanted to confirm that SGDR (linked below) is the motivation behind using the restarting strategy? And essentially we can use weights in the yml file to control the min/max learning rate for each restart.

https://arxiv.org/abs/1608.03983

xinntao commented 4 years ago

We experimentally have found this strategy (Cosine learning rate with restarts) improve both the convergence speed and final performance. So we use this strategy.

Some references: [1] SGDR: Stochastic Gradient Descent with Warm Restarts [2] Cyclical Learning Rates for Training Neural Networks [3] Snapshot Ensembles: Train 1, get M for free

As you said, we can use different weights to control lr of each restarts. We just find use the current strategy is better from our experiments.

TouqeerAhmad commented 4 years ago

Thank you for you input Xintao!