Why do you reset the loss every tenth scheduler step, and step the scheduler and zero the gradients every 4th iteration while training on the Charades Dataset?

piergiaj / pytorch-i3d

Apache License 2.0

971 stars 250 forks source link

Why do you reset the loss every tenth scheduler step, and step the scheduler and zero the gradients every 4th iteration while training on the Charades Dataset? #44

Open tym0027 opened 5 years ago

tym0027 commented 5 years ago

Forgive me if the answer is obvious, but I am using this pytorch implementation with my own data and am confused what the purpose is of a few lines of code in train_i3d.py are doing.

The optimizer was having it's gradients zeroed and the scheduler stepped every 4th iteration, and then you reset the loss every tenth step made with the scheduler. Does this relate to something specific in the CHARADEs dataset? I had assumed during implementation of my own project, that it was something specific to i3d.

piergiaj commented 5 years ago

This was done for gradient accumulation over multiple batches. When training with only a few GPUs, it is often helpful to accumulate gradients over many batches to increase the batch size per step. It is a hyperparameters, and the best value will vary from dataset to dataset.

burak43 commented 4 years ago

@piergiaj I think num_steps_per_update must be a divider of len(dataloader) which is the number of required forward passes for the training data. Otherwise, calculated loss values for leftover forward passes are zerod when phase changes from train to val. So, the leftover forward passes' loss values are not used at all because after validation phase, loss values will start from zero; however, there were leftover loss values from previous training forward passes. Am I right?