Open tym0027 opened 5 years ago
This was done for gradient accumulation over multiple batches. When training with only a few GPUs, it is often helpful to accumulate gradients over many batches to increase the batch size per step. It is a hyperparameters, and the best value will vary from dataset to dataset.
@piergiaj I think num_steps_per_update
must be a divider of len(dataloader) which is the number of required forward passes for the training data. Otherwise, calculated loss values for leftover forward passes are zerod when phase changes from train
to val
. So, the leftover forward passes' loss values are not used at all because after validation phase, loss values will start from zero; however, there were leftover loss values from previous training forward passes. Am I right?
Forgive me if the answer is obvious, but I am using this pytorch implementation with my own data and am confused what the purpose is of a few lines of code in train_i3d.py are doing.
The optimizer was having it's gradients zeroed and the scheduler stepped every 4th iteration, and then you reset the loss every tenth step made with the scheduler. Does this relate to something specific in the CHARADEs dataset? I had assumed during implementation of my own project, that it was something specific to i3d.