Open msyim opened 7 years ago
Performance boost history :
Found out that this ensemble network is having trouble differentiating 3's from 8's (and vice versa) and 7's from 9's (and vice versa). I will have two separate networks which specialize in differentiating them and have them verify whenever the ensemble's prediction is one of the aforementioned four.
Apparently "going deeper" gives a better result:
1 FFNN with 4 hidden layers of size 512 nodes: ~98.2%
However, using the above architecture can be problematic with the way I used to initialize the parameters and the optimizer I was using. For the models tested earlier, I used "random normal" to initialize weights and "GradientDescent" for the optimizer, which resulted in "nan" cost.
NOTES : need to study what each optimizer does, and how xavier initializer differs, when to use which optimizer and initializer, etc.
For some reason, BN is not really helping ( in fact, NN using BN has worse performance than the ones not using BN). Will further investigate why.