Batch training - Githubissues

So far, batch training is 2 orders of magnitude slower at converging than online training. This article, and its benchmark image, may shed more light: https://visualstudiomagazine.com/Articles/2014/08/01/Batch-Training.aspx?Page=2

Also, batch training requires storing more data (delta and gradient accumulators). Lastly, since training is now split from delta and gradient calculation, this means we have to loop through the entire network 2 additional times every epoch compared to online training. Once to reset deltas/gradients, another to accumulate deltas/gradients, and again to update weights based on accumulated deltas. Constrast this with online learning, there is a single train method that can do all three of these processes in one loop.

With the same training rate of 0.3, Anny takes on average 420 epochs to train an OR gate where the online method takes 250 average epochs.

An error is suspected as the other gates do not converge. Also, the larger the network, the greater chance of not converging even on an OR gate. This is the opposite case as the online method, where slightly larger nets trained better on all gates compared to the minimum required net size.

levithomason / anny

Batch training #62