The current implementation of using itertools.islice(train_loader, gradient_accum_steps) to do gradient accumulation is wrong, as it will always use the same batch of data to perform update. Better to replace it with regular enumerate(train_loader) to do that and checking for update steps
The current implementation of using itertools.islice(train_loader, gradient_accum_steps) to do gradient accumulation is wrong, as it will always use the same batch of data to perform update. Better to replace it with regular enumerate(train_loader) to do that and checking for update steps