Closed wppply closed 6 years ago
In the readme.md, you claims that the iter_size option could increase the batchsize to batchsize*iter_size.
However, I did not see any code about this except divide the loss by iter_size.
Could you give more details? thanks
Gradient is accumulated for iter_size iterations and then applied. So, gradients are applied only once after iter_sizeiterations. The division you are referring to is to average the accumulated gradients.
iter_size
In the readme.md, you claims that the iter_size option could increase the batchsize to batchsize*iter_size.
However, I did not see any code about this except divide the loss by iter_size.
Could you give more details? thanks