Open jinyyy666 opened 7 years ago
It's the character of stochastic gradient descent algorithm.
Thanks for the reply! I am googling this issue and finding out that it is quite common when the update is very small. By setting the batch = 1, the update at each time step is very small. But that might lead to numerical instablity as pointed out in this link: https://datascience.stackexchange.com/questions/15962/why-is-learning-rate-causing-my-neural-networks-weights-to-skyrocket
Hi guys,
Really appreciate your elegent code!
Right now I playing with the batch size setting. For Mnist dataset, when I set the batch size = 1, I will get nan for the weight. I guess that is because the learning rate is too large for the batch = 1. Any ideas why this is happening?
Thanks,
Jimmy