We're currently performing very well using 100 neurons and random initialization. The question arises: does Xavier Initialization reduce the performance? This would imply that we're performing so well due to the enlarged variance of the initialized weights and biases introduced by the big number of neurons.
We're currently performing very well using 100 neurons and random initialization. The question arises: does Xavier Initialization reduce the performance? This would imply that we're performing so well due to the enlarged variance of the initialized weights and biases introduced by the big number of neurons.