Hi,
Thanks for sharing the implementation of "Matching the Blank" paper. I just wanted to know what is the benefit of changing gradient_acc_steps parameter? Does it make training/convergence faster?
gradient_acc_steps = 2 will wait for 2 batch passes before backpropagating gradients. Allows you to effectively increase batch size by 2 while keeping GPU memory fixed
Hi, Thanks for sharing the implementation of "Matching the Blank" paper. I just wanted to know what is the benefit of changing
gradient_acc_steps
parameter? Does it make training/convergence faster?Thanks and regards