propose a Snapshot Ensemble, generating ensemble model in single training run using cyclical learning rate
Improves performance of CNNs in CIFAR-10, CIFAR-100
Details
Motivation
Traditional SGD scheme monotonically decreasing learning rate reaches to a single local minima which may not be optimal
visiting multiple local minima via cyclical learning rate and ensembling their results leads to better performance with the cost of single training run
Related Works
Implicit Ensemble
Dropout, DropConnect, Stochastic Depth, Swapout all drop parts of neural network during training time to obtain 'implicit' ensemble effect
Explicit Ensemble
multiple checkpoint ensemble, multiple training run ensemble, boosting etc
this paper's contribution is that it generates explicit ensemble in a single training run
Cyclical Learning Rate
cyclical learning rate enables model to visit multiple local minima
Results
better performance than naive single model, dropout etc
Activation Space
correlation of softmax outputs between snapshots reveal that Snapshot Ensemble models are all different, compared to 4, 5, 6 having almost-identical distribution in traditional lr scheduling
Abstract
Snapshot Ensemble
, generating ensemble model in single training run using cyclical learning rateDetails
Related Works
Cyclical Learning Rate
Results
Activation Space
4, 5, 6
having almost-identical distribution in traditional lr schedulingPersonal Thoughts
Link : https://arxiv.org/pdf/1704.00109.pdf Authors : Huang et al. 2017