wagonhelm / NaNmnist

MNIST Tutorial
36 stars 17 forks source link

Highly improper way of weight initialization #2

Open ayushjn20 opened 6 years ago

ayushjn20 commented 6 years ago

As far as I know, initializing weights as zeroes, even as a beginner is highly improper way of initialization. CS231n open lecture notes says,

Pitfall: all zero initialization. Lets start with what we should not do. Note that we do not know what the final value of every weight should be in the trained network, but with proper data normalization it is reasonable to assume that approximately half of the weights will be positive and half of them will be negative. A reasonable-sounding idea then might be to set all the initial weights to zero, which we expect to be the “best guess” in expectation. This turns out to be a mistake, because if every neuron in the network computes the same output, then they will also all compute the same gradients during backpropagation and undergo the exact same parameter updates. In other words, there is no source of asymmetry between neurons if their weights are initialized to be the same.

wagonhelm commented 6 years ago

As far as I understand with a single layer networks (i.e. perceptrons) this is totally fine, this also helps with the gif training animation making a clear animation progression rather than a noisy one. Deep nets definitely need random initializations, I am curious to see what kind of difference a random initialization would create on the perceptron. Care to look into this @ayushjn20?