Closed fariasfc closed 9 years ago
Starting with zeros for the bias is the norm in fully-connected layers.
It has recently been shown that one-initialized biases worked better in LSTM, though. But LSTM are very different from feedforward dense nets.
Thanks for the quick answer @fchollet!
Do you know which paper states that?
Another doubt, not very related to the bias, but to the shared_zeros on the SGD optimizer on get_updates
m = shared_zeros(p.get_value().shape) # momentum v = self.momentum * m - lr * g # velocity
Since m == 0, it doesn't mean that self.momentum * m == 0 => v = -lr * g?
Re: which paper, whilst it was introduced far earlier (Gers et al., 2000), the improvements were again shown in a variety of tasks in An Empirical Exploration of Recurrent Network Architectures (Jozefowicz et al., 2015): "We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU."
Thank you, @Smerity!
Do you know my previous question about the velocity calculation?
@fchollet, I wouldn't be surprised that initializing biases to zero is standard, but it seems to be unlikely to be optimal for many normal problems one might want to solve with FC NNs. For instance, consider regression with relu-style activations. If the biases are initialized to zero, the half-planes of positive activation specified by each neuron are starting in a non-generic configuration (the bounding hyperplanes all pass through the origin)! This seems like a terrible way to start finding a solution to the regression problem since you would want the positive domains to have irregular overlaps.
It would be nice to have options for bias initialization.
Or add a Bias core layer In this famous paper in bioinformatics: http://www.nature.com/nbt/journal/v33/n8/extref/nbt.3300-S2.pdf The bias of last Dense layer in init to -4.0 due the dataset have background bias
deepbind_model = [ motif_scan(num_motifs = 16, motif_len = 24, weight_decay = loguniform(1e-10,1e-3), init_scale = loguniform(1e-7,1e-3), bias(), rectify(), maxpool(), full(num_units = 32, weight_decay = loguniform(1e-10,1e-3), init_scale = loguniform(1e-5,1e-2), rectify(), dropout(expected_value = choice([0.5, 0.75, 1.0])), full(num_units = 1, weight_decay = loguniform(1e-10,1e-3), init_scale = loguniform(1e-5,1e-2), bias(init_bias = -4.0), ]
@fchollet or add a Bias in Core Layer In this famous bioinformatics paper published in nature http://www.nature.com/nbt/journal/v33/n8/extref/nbt.3300-S2.pdf the bias final Dense layer is init to -0.4 due to the dataset
Nice find, SunYu! On Tue, Sep 13, 2016 at 10:34 PM SunYu notifications@github.com wrote:
Or add a Bias core layer
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/339#issuecomment-246911035, or mute the thread https://github.com/notifications/unsubscribe-auth/ACWGWMRTsUVmpBKSUluxRGd_Hhv5g4H2ks5qp4d5gaJpZM4FSVp0 .
Hello!
I was looking my network weights, when I realize that all my bias values on a Dense layer were 0s. I saw on the Dense Layer initialization: self.b = shared_zeros((self.output_dim)). Shouldn't be self.b = shared_ones((self.output_dim))?
I don't know if I didn't understand, or if it exists somewhere inside the code that changes it. I'm kinda new on Theano stuff.