Open Xie-Minghui opened 3 years ago
Sorry for the late reply. As you said, the gradient accumulation code is not implemented properly. At first I tried to use it, but I got a new gpu card, so I didn't implement it.
thank you for your opinion. i will modify code soon.
in your code, if args.gradient_accumulation_steps > 1, loss.backward() will not be excuted. But in every step, loss.backward() should be excuted. The normal gradient accumulation process is as follows:
I don't know if I was wrong.