hanjanghoon / BERT_FP

Fine-grained Post-training for Improving Retrieval-based Dialogue Systems - NAACL 2021
97 stars 19 forks source link

About gradient accumulation in your douban_final.py #2

Open Xie-Minghui opened 3 years ago

Xie-Minghui commented 3 years ago

image in your code, if args.gradient_accumulation_steps > 1, loss.backward() will not be excuted. But in every step, loss.backward() should be excuted. The normal gradient accumulation process is as follows: image

I don't know if I was wrong.

hanjanghoon commented 3 years ago

Sorry for the late reply. As you said, the gradient accumulation code is not implemented properly. At first I tried to use it, but I got a new gpu card, so I didn't implement it.

thank you for your opinion. i will modify code soon.