Closed efosler closed 6 years ago
That would be great. Newbob came out of ICSI, right? Did you do it?
On Aug 23, 2018, at 8:21 AM, Eric Fosler-Lussier notifications@github.com wrote:
I noticed particularly when training big nets that the default learning rate mechanism is pretty inefficient. By and large, after the initial burn in and first rate drop, the system spends at most one fruitful epoch at each learning rate before having to backtrack. The "newbob" rate schedule is probably more efficient than the default (which is typically flat until no improvement, then halves until no improvement, then stops).
I'm planning to implement a separate learning rate scheduler module that can be switched in, which will allow for different implementations. I'll keep the current as a default, but we may want to consider switching to newbob at some point.
Plus I love the idea of "newbob" continuing to live on...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/195, or mute the thread https://github.com/notifications/unsubscribe-auth/AEnA8WOY8k8nuTenjJc381zXqm8GsQJzks5uTp4wgaJpZM4WJaYi.
ICSI, yes, but It wasn't me - it was a programmer that Morgan had hired back when we had a NN trainer called BoB (Boxes of Boxes). He discovered the same kind of trend and just made "newbob" the default since it was the second thing he tried. The name carried over when Dave coded up quicknet.
I just finished the --roll deprecation; I'll code this up today and test it out, and then "roll" those two changes together once the other pull request gets sorted out.
Hi Eric,
Thank for bringing this. Yes, I love the idea of having a different module (maybe a class) that can control the learning reate schedule much more efficiently and in a more modular way. Please, let me know if I can help with that.
Here is where we concentrated our scheduler:
https://github.com/srvk/eesen/blob/tf_clean/tf/ctc-am/tf/tf_train.py#L132
Is newbob idea?
--lrate="D:l:c:dh,ds:n"
starts with the learning rate l; if the validation error reduction between two consecutive epochs is less than dh, the learning rate is scaled by c during each of the remaining epochs. Traing finally terminates when the validation error reduction between two consecutive epochs falls below ds. n is the minimum epoch number after which scaling can be performed.
If so, I think that we have some initial idea implemented. We currently have the initial burn-out period and we load the best previous weights. So, we need to implement this: error reduction between two consecutive epochs is less
, where we can implement this two
as another hyper parameter.
Yep, that seems about right. What is D?
How about this:
I'm about halfway through factorizing the code (called the old lrscheduler "Halvesies").
awesome works for me. Thanks Eric!
I just took the explanations from Yajie:
https://www.cs.cmu.edu/~ymiao/pdnntk/lrate.html
D was the type of scheduler. But i think --lrate_algorithm
we are good to go.
OK, I have three schedulers working (Halvsies - the current schedule, Newbob, and Constantlr). Everything checks out for new runs. The code does not quite do the right thing for restarts, so I will fix that tomorrow and then make a pull request.
Created pull request #198 which implements the new lrscheduler objects.
awesome. Thank you very much again Eric. Will merge it now thanks.
There is one thing that keeps buzzing me. When restarting the training, should we not consider the previous TER? To know in which point are we in the ramping phase?
Heh. What I did was tag the particular lrscheduler messages in the log file so you know what phase you're in - the log files are read at the same time as the TER is recalculated from the relevant log file. For example (pulled from the output rather than the log file, using a rather odd set of parameters since I was making sure that stuff worked right):
[2018-08-24 15:39:32] Epoch 1 starting, learning rate: 0.06
[2018-08-24 15:39:57] Epoch 1 finished in 0 minutes
Train cost: 303.2, ter: 87.7%, #example: 297
Validate cost: 182.0, ter: 88.9%, #example: 30
[2018-08-24 15:39:57] LRScheduler.Newbob: not updating learning rate for first 3 epochs
--------------------------------------------------------------------------------
[2018-08-24 15:39:57] Epoch 2 starting, learning rate: 0.06
[2018-08-24 15:40:21] Epoch 2 finished in 0 minutes
Train cost: 206.1, ter: 89.1%, #example: 297
Validate cost: 117.2, ter: 92.8%, #example: 30
[2018-08-24 15:40:21] LRScheduler.Newbob: not updating learning rate for first 3 epochs
--------------------------------------------------------------------------------
[2018-08-24 15:40:21] Epoch 3 starting, learning rate: 0.06
[2018-08-24 15:40:45] Epoch 3 finished in 0 minutes
Train cost: 166.3, ter: 94.2%, #example: 297
Validate cost: 140.0, ter: 97.3%, #example: 30
[2018-08-24 15:40:45] LRScheduler.Newbob: not updating learning rate for first 3 epochs
--------------------------------------------------------------------------------
[2018-08-24 15:40:45] Epoch 4 starting, learning rate: 0.06
[2018-08-24 15:41:10] Epoch 4 finished in 0 minutes
Train cost: 154.5, ter: 96.6%, #example: 297
Validate cost: 108.3, ter: 96.4%, #example: 30
[2018-08-24 15:41:10] LRScheduler.Newbob: learning rate remaining constant 0.06, TER improved 0.9% from epoch 3
--------------------------------------------------------------------------------
[2018-08-24 15:41:10] Epoch 5 starting, learning rate: 0.06
[2018-08-24 15:41:34] Epoch 5 finished in 0 minutes
Train cost: 143.0, ter: 94.7%, #example: 297
Validate cost: 97.5, ter: 98.3%, #example: 30
[2018-08-24 15:41:34] LRScheduler.Newbob: beginning ramping to learn rate 0.045, TER difference -1.9% under threshold 0.0% from epoch 4
restoring model from epoch 4
--------------------------------------------------------------------------------
[2018-08-24 15:41:35] Epoch 6 starting, learning rate: 0.045
[2018-08-24 15:41:59] Epoch 6 finished in 0 minutes
Train cost: 143.2, ter: 94.8%, #example: 297
Validate cost: 107.6, ter: 96.2%, #example: 30
[2018-08-24 15:41:59] LRScheduler.Newbob: learning rate ramping to 0.03375, TER improved 0.3% from epoch 4
--------------------------------------------------------------------------------
[2018-08-24 15:41:59] Epoch 7 starting, learning rate: 0.03375
[2018-08-24 15:42:23] Epoch 7 finished in 0 minutes
Train cost: 140.8, ter: 95.6%, #example: 297
Validate cost: 104.7, ter: 97.2%, #example: 30
[2018-08-24 15:42:23] LRScheduler.Newbob: stopping training, TER difference -1.0% under threshold 0.0% from epoch 6
restoring model from epoch 6
(Basically, the idea occurred to me when I saw the code looking for VALIDATE tags at restart to get the TER.)
Updated with a patch to the newbob scheduler (#202); closing the issue.
I noticed particularly when training big nets that the default learning rate mechanism is pretty inefficient. By and large, after the initial burn in and first rate drop, the system spends at most one fruitful epoch at each learning rate before having to backtrack. The "newbob" rate schedule is probably more efficient than the default (which is typically flat until no improvement, then halves until no improvement, then stops). One could have the same burn-in period as well.
I'm planning to implement a separate learning rate scheduler module that can be switched in, which will allow for different implementations. I'll keep the current as a default, but we may want to consider switching to newbob at some point.
Plus I love the idea of "newbob" continuing to live on...