jakezhaojb / DSGA-1008-Spring2017-A2

DSGA-1008-Spring2017-A2
4 stars 18 forks source link

Learning rate decay #1

Open mullachv opened 7 years ago

mullachv commented 7 years ago

Jake, line 161 should be dividing by 4.0 and not 4 (float vs int): https://github.com/jakezhaojb/DSGA-1008-Spring2017-A2/blob/master/main.py#L161

Otherwise, the learning rate will go down to zero and stay there (integer division)

mullachv commented 7 years ago
| epoch  14 |  2000/ 2323 batches | lr 1.00 | ms/batch 132.84 | loss  3.93 | ppl    51.05
| epoch  14 |  2200/ 2323 batches | lr 1.00 | ms/batch 130.91 | loss  3.87 | ppl    47.75
-----------------------------------------------------------------------------------------
| end of epoch  14 | time: 326.76s | valid loss  4.84 | valid ppl   126.41
-----------------------------------------------------------------------------------------
| epoch  15 |   200/ 2323 batches | lr 0.00 | ms/batch 133.49 | loss  4.12 | ppl    61.81
| epoch  15 |   400/ 2323 batches | lr 0.00 | ms/batch 131.63 | loss  4.27 | ppl    71.50
-----------------------------------------------------------------------------------------
| end of epoch  15 | time: 315.69s | valid loss  4.84 | valid ppl   126.41
-----------------------------------------------------------------------------------------
| epoch  16 |   200/ 2323 batches | lr 0.00 | ms/batch 131.21 | loss  4.12 | ppl    61.81
| epoch  16 |   400/ 2323 batches | lr 0.00 | ms/batch 134.99 | loss  4.27 | ppl    71.50
jakezhaojb commented 7 years ago

Good catch, can you submit a PR to the master branch?