(maybe this can be documentation rather than an issue)

How gradients of batches are calculated

In the general optimizer there is a function optimizer_calc_batch_gradient() that calculates the gradient of the batch. The trick used here is to have a integer index array, called pivot own to abstract optimizer, that keeps track of the shuffled order of the samples. When calculating the batch gradient the gradient for the samples are calculated separately (one by one) and the mean of all samples is returned as the batch gradient. To speed things up, the batch gradient is threaded using OpenMP. Works really good as the batch samples are selected from the shuffled index array and the sample data (input vector and target vector) are never moved around in memory. Short said: The gradient of a batch is calculated sample-by-sample in a threaded manner.

Alternative solution

An alternative would be to use the BLAS libraries more extensively and calculate the batch gradient with matrix operations. This will of course require an new implementation of backpropagation with the batch_size as a parameter. (The current backpropagation function can only take one sample/target and calculate the gradient.) The drawback of this approach is that we then need to reconstruct the samples from the index array, and this will involve a lot of memory copying which can be a performance killer. Then there might be an other alternative to actually do the memory shuffling of the training dataset in the shuffle() function. Not sure what is most efficient.

Another consideration

The threading used in the batch gradient calculation is based on OpenMP. This might come in conflict with other threading technologies. Could it be an idea to use classic pthreads instead of OpenMP?

oysteijo / simd_neuralnet

Consider how to deal with backpropagation of batches #66

How gradients of batches are calculated

Alternative solution

Another consideration