Closed nbansal90 closed 6 years ago
Hi! Mainly, as I know, "applying weight decay to the bias units usually makes only a small difference to the final network". So I didn't drop the just for simplicity. You may try to train network without bias term regularization, and reopen the issue with the results.
Thank you for your effort, it is really helping me in my project. Illarion, I had a small doubt in the implementation part.
While you have applied l2_regularization, you have applied to all the parameters ie weights and bias. Is that advisory? given that l2 is mainly/mostly applied only to weights.
l2_loss = tf.add_n( [tf.nn.l2_loss(var) for var in tf.trainable_variables()])
Shouldn't we just add the tf.nn.l2_loss for weights only ?