Learning rate matters - Githubissues

We're currently using adaptive SGD (mostly RMSprop) and hoping that the default parameters work for us. Many recent (and not so recent) deep learning papers have a schedule for decreasing the learning rate over time, often phrased something like "halve the learning rate every 50 epochs". Here's an extreme case from Densely Connected Convolutional Networks (learning rate drop by 10x accompanied with major decrease in loss):

DenseNet loss vs. epochs

I imagine the following three parameters would be a useful addition to the cross-validation loop used to determine optimal hyperparameters:

initial learning rate
epochs until LR decay
LR decay factor

openvax / mhcflurry

Learning rate matters #64