Snapshot Ensembles: Train 1, Get M for Free

Abstract

propose a Snapshot Ensemble, generating ensemble model in single training run using cyclical learning rate
Improves performance of CNNs in CIFAR-10, CIFAR-100

Motivation
- Traditional SGD scheme monotonically decreasing learning rate reaches to a single local minima which may not be optimal
- visiting multiple local minima via cyclical learning rate and ensembling their results leads to better performance with the cost of single training run

screen shot 2018-02-22 at 12 18 52 pm

Related Works
- Implicit Ensemble
- Dropout, DropConnect, Stochastic Depth, Swapout all drop parts of neural network during training time to obtain 'implicit' ensemble effect
- Explicit Ensemble
- multiple checkpoint ensemble, multiple training run ensemble, boosting etc
- this paper's contribution is that it generates explicit ensemble in a single training run
Cyclical Learning Rate
- cyclical learning rate enables model to visit multiple local minima
Results
- better performance than naive single model, dropout etc
Activation Space
- correlation of softmax outputs between snapshots reveal that Snapshot Ensemble models are all different, compared to 4, 5, 6 having almost-identical distribution in traditional lr scheduling