Open stefanpeidli opened 6 years ago
We need to solve the issue of introducing batch size first, since the Stochastic Gradient Descent Methods randomly chooses data points to be learned from the batch size. If the batch size is one, there is no choice to be made really... (But: We could choose randomly if we learn a board or not, i.e. we allow for the choice of {} to be made!)
I have implemented Stochastic gradient descent. The stochastic coefficient denotes how big the random selection in one epoch is. (a value of 0.8 means that in each epoch randomly chosen 80% of the boards are been looked at, 20% are ignored).
A first Test:
Blue: Vanilla Gradient descent. Time needed: 536s Green: Stochastic Gradient descent with stochastic coefficient 0.2. Time needed: 143s Red: Stochastic Gradient descent with stochastic coefficient 0.7. Time needed: 352s
Further tests will have to show wether the time saving is worth the loss in accuracy, and which stochastic coefficient value is in fact the most reasonable one.
Faruk and I also came up with the idea of using an adaptive step size for gradient descent method, since sometimes the error increases in an epoch. ->TODO
Atm we use "strawberry" Gradient Descent Method on the error surface given by the respective error function. The Question now is: Which Method is best to use?
A Candidate is Stochastic Gradient Descent as described by Heining, chapter 2.2.