We are dividing the training examples into mini-batches while training. However, it looks like that we are updating the parameters of the network after every training example.
Are we intentionally doing some kind of online training ? I guess we can average the gradients over all the samples of a mini-batch and then update the parameters?
Hi @macournoyer
We are dividing the training examples into mini-batches while training. However, it looks like that we are updating the parameters of the network after every training example.
Are we intentionally doing some kind of online training ? I guess we can average the gradients over all the samples of a mini-batch and then update the parameters?
Thanks, Vikram