Closed kastnerkyle closed 10 years ago
Wow, this is an old one -- sorry for letting it drop!
I don't really have anything insightful to say about this, except that the example you contributed of the deep autoencoder seems to work fairly well at the moment, even for logistic activations and training with sgd. My goto solutions for deep networks that don't seem to train well all at once is to reach for "relu" activations (instead of sigmoid), and then to try using a higher-order trainer, like the CG or HF trainers.
Thanks - I am pretty sure this is less "bug in code" and more "mathematical limitation in neural networks". Closing!
This gist sums up what I am seeing. When I try to do an overcomplete autoencoder, the sparsity for all decode layers shows up as 1. and the cost gets "stuck" at ~87 (because the gradient can't flow backwards with totally sparse layers?)
I encountered this while trying to build the canonical 784-1000-500-250-30-350-500-1000-784 deep autoencoder for MNIST digits - didn't have time to explore or recreate til now. Any thoughts?