gradient_acc_steps parameter

plkmo / BERT-Relation-Extraction

PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper

Apache License 2.0

573 stars 133 forks source link

gradient_acc_steps parameter #22

Closed svjan5 closed 4 years ago

svjan5 commented 4 years ago

Hi, Thanks for sharing the implementation of "Matching the Blank" paper. I just wanted to know what is the benefit of changing gradient_acc_steps parameter? Does it make training/convergence faster?

Thanks and regards

plkmo commented 4 years ago

gradient_acc_steps = 2 will wait for 2 batch passes before backpropagating gradients. Allows you to effectively increase batch size by 2 while keeping GPU memory fixed