Bias initialization - Githubissues

fariasfc commented 9 years ago

Hello!

I was looking my network weights, when I realize that all my bias values on a Dense layer were 0s. I saw on the Dense Layer initialization: self.b = shared_zeros((self.output_dim)). Shouldn't be self.b = shared_ones((self.output_dim))?

I don't know if I didn't understand, or if it exists somewhere inside the code that changes it. I'm kinda new on Theano stuff.

fchollet commented 9 years ago

Starting with zeros for the bias is the norm in fully-connected layers.

It has recently been shown that one-initialized biases worked better in LSTM, though. But LSTM are very different from feedforward dense nets.

fariasfc commented 9 years ago

Thanks for the quick answer @fchollet!

Do you know which paper states that?

Another doubt, not very related to the bias, but to the shared_zeros on the SGD optimizer on get_updates

m = shared_zeros(p.get_value().shape) # momentum v = self.momentum * m - lr * g # velocity

Since m == 0, it doesn't mean that self.momentum * m == 0 => v = -lr * g?

Smerity commented 9 years ago

Re: which paper, whilst it was introduced far earlier (Gers et al., 2000), the improvements were again shown in a variety of tasks in An Empirical Exploration of Recurrent Network Architectures (Jozefowicz et al., 2015): "We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU."

fariasfc commented 9 years ago

Thank you, @Smerity!

Do you know my previous question about the velocity calculation?

d-rams commented 8 years ago

@fchollet, I wouldn't be surprised that initializing biases to zero is standard, but it seems to be unlikely to be optimal for many normal problems one might want to solve with FC NNs. For instance, consider regression with relu-style activations. If the biases are initialized to zero, the half-planes of positive activation specified by each neuron are starting in a non-generic configuration (the bounding hyperplanes all pass through the origin)! This seems like a terrible way to start finding a solution to the regression problem since you would want the positive domains to have irregular overlaps.

It would be nice to have options for bias initialization.

sun9700 commented 8 years ago

Or add a Bias core layer In this famous paper in bioinformatics: http://www.nature.com/nbt/journal/v33/n8/extref/nbt.3300-S2.pdf The bias of last Dense layer in init to -4.0 due the dataset have background bias

deepbind_model = [ motif_scan(num_motifs = 16, motif_len = 24, weight_decay = loguniform(1e-10,1e-3), init_scale = loguniform(1e-7,1e-3), bias(), rectify(), maxpool(), full(num_units = 32, weight_decay = loguniform(1e-10,1e-3), init_scale = loguniform(1e-5,1e-2), rectify(), dropout(expected_value = choice([0.5, 0.75, 1.0])), full(num_units = 1, weight_decay = loguniform(1e-10,1e-3), init_scale = loguniform(1e-5,1e-2), bias(init_bias = -4.0), ]

sun9700 commented 8 years ago

@fchollet or add a Bias in Core Layer In this famous bioinformatics paper published in nature http://www.nature.com/nbt/journal/v33/n8/extref/nbt.3300-S2.pdf the bias final Dense layer is init to -0.4 due to the dataset

hubayirp commented 8 years ago

Nice find, SunYu! On Tue, Sep 13, 2016 at 10:34 PM SunYu notifications@github.com wrote:

Or add a Bias core layer

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/339#issuecomment-246911035, or mute the thread https://github.com/notifications/unsubscribe-auth/ACWGWMRTsUVmpBKSUluxRGd_Hhv5g4H2ks5qp4d5gaJpZM4FSVp0 .

keras-team / keras

Bias initialization #339