Regardless of the width of the hidden units, it seems if I have more than 3 hidden layers, the dropout training does not work. I wonder if some bug is causing it.
4-layer with backprop only works
$ python mlp.py backprop
... building the model: hidden layers [600, 200, 100, 100], dropout: False [0.0, 0.0, 0.0, 0.0, 0.0]
... training
epoch 1, test error 0.375 (300), learning_rate=1.0 (patience: 7448 iter 930) **
epoch 2, test error 0.375 (300), learning_rate=0.998 (patience: 7448 iter 1861)
epoch 3, test error 0.375 (300), learning_rate=0.996004 (patience: 7448 iter 2792)
epoch 4, test error 0.375 (300), learning_rate=0.994011992 (patience: 7448 iter 3723)
epoch 5, test error 0.375 (300), learning_rate=0.992023968016 (patience: 7448 iter 4654)
epoch 6, test error 0.3625 (290), learning_rate=0.99003992008 (patience: 7448 iter 5585) **
epoch 7, test error 0.33875 (271), learning_rate=0.98805984024 (patience: 22340.0 iter 6516) **
epoch 8, test error 0.3175 (254), learning_rate=0.986083720559 (patience: 26064.0 iter 7447) **
epoch 9, test error 0.32375 (259), learning_rate=0.984111553118 (patience: 29788.0 iter 8378)
epoch 10, test error 0.325 (260), learning_rate=0.982143330012 (patience: 29788.0 iter 9309)
Regardless of the width of the hidden units, it seems if I have more than 3 hidden layers, the dropout training does not work. I wonder if some bug is causing it.
4-layer with backprop only works
4-layer with dropout does not work
3-layer with dropout works