Closed j50888 closed 7 years ago
We do scaling in data augmentation, therefore the input images have different sizes and we cannot batch them. Instead, when we train the parent network we do temporal batching as we do the mean of 10 gradients before updating the weights (with the iter_mean_grad parameter of the train_parent function).
In the fine-tuning we didn't experience much difference if we were averaging several gradients or not, so the iter_mean_grad parameter is set to 1 to train faster.
Maybe we can do the same scaling for images in the same batch to achieve bigger batch size?
Yes, I agree that we could do that. For simplicity I didn't do it, but feel free to implement it.
Yes this could speed-up using a more consistent batch. I see also many todo in the code to move the preprocessing steps in Tensorflow...
Thanks!
I wanna know why the batch size set to 1 when training? Why not bigger?