Closed TouqeerAhmad closed 4 years ago
Wanted to confirm that SGDR (linked below) is the motivation behind using the restarting strategy? And essentially we can use weights in the yml file to control the min/max learning rate for each restart.
We experimentally have found this strategy (Cosine learning rate with restarts) improve both the convergence speed and final performance. So we use this strategy.
Some references: [1] SGDR: Stochastic Gradient Descent with Warm Restarts [2] Cyclical Learning Rates for Training Neural Networks [3] Snapshot Ensembles: Train 1, get M for free
As you said, we can use different weights to control lr of each restarts. We just find use the current strategy is better from our experiments.
Thank you for you input Xintao!
Hi,
Looking at the provided log files, the learning rate starts from 4e-4 and goes to 1e-7 in 150k iterations and then restarts from 4e-4. Can you please provide the reasoning for this learning rate strategy or a reference may be?
Thanks, Touqeer