Closed kastnerkyle closed 11 years ago
I think what you're referring to is a sparsity constraint on the activations of the hidden units:
loss = || V g(W x) - x ||_2 + a || g(W x) ||_1
where W are the encoding weights, V are the decoding weights, and the second term implements a penalty on the hidden-unit activations.
If that's what you're referring to, use the --hidden-l1
command-line flag to control the value of a
.
It might make more sense to newcomers to have a SparseAutoencoder subclass, but that still wouldn't solve the problem of how to set the a
parameter. Any thoughts ?
This is exactly what I was looking for - just missed it because I had my blinders on! I kept looking at the Autoencoder cost directly, rather than the J
shared by all nets. Oops.
As far as the a
parameter is concerned (and parameters in general), I have been looking at two papers from ICML 2013 - No More Pesky Learning Rates and On The Importance of Initialization and Momentum in Deep Learning . Maybe implementing this will give some ideas?
I actually just implemented NAG from the second paper. :) I've been waiting to check it in until we get this cascaded trainer thing merged.
I think I read the learning rates paper but don't remember how to do it. I'll give the paper another look this week.
Going to close this one out.
Glad to hear you implemented NAG! It looks really promising.
Is there currently any way to add a sparsity constraint to the cost of an autoencoder? I see regularization terms (weight_l1, l2, etc.) but another tutorial also mentions an explicit sparsity term. Looking at it, it seems (to me at least) different than regularization based on the weights. However, other deep learning notes don't seem to have this parameter, at least not in the form shown in the link.
If we do need this functionality, I am thinking a separate SparseAutoencoder class might be better than adding construction options to the current autoencoder - what are your thoughts?