zhxfl / CUDA-CNN

CNN accelerated by cuda. Test on mnist and finilly get 99.76%
184 stars 85 forks source link

The batch size setting #8

Open jinyyy666 opened 7 years ago

jinyyy666 commented 7 years ago

Hi guys,

Really appreciate your elegent code!

Right now I playing with the batch size setting. For Mnist dataset, when I set the batch size = 1, I will get nan for the weight. I guess that is because the learning rate is too large for the batch = 1. Any ideas why this is happening?

Thanks,

Jimmy

zhxfl commented 7 years ago

It's the character of stochastic gradient descent algorithm.

jinyyy666 commented 7 years ago

Thanks for the reply! I am googling this issue and finding out that it is quite common when the update is very small. By setting the batch = 1, the update at each time step is very small. But that might lead to numerical instablity as pointed out in this link: https://datascience.stackexchange.com/questions/15962/why-is-learning-rate-causing-my-neural-networks-weights-to-skyrocket