Open ayushjn20 opened 6 years ago
As far as I understand with a single layer networks (i.e. perceptrons) this is totally fine, this also helps with the gif training animation making a clear animation progression rather than a noisy one. Deep nets definitely need random initializations, I am curious to see what kind of difference a random initialization would create on the perceptron. Care to look into this @ayushjn20?
As far as I know, initializing weights as zeroes, even as a beginner is highly improper way of initialization. CS231n open lecture notes says,