Open dathudeptrai opened 4 years ago
You may want to check some implementation like : https://www.kaggle.com/kentaronakanishi/tf2-0-way-to-accumulate-gradients-in-custom-loop#Model
Using the keras training adding accumulation can be done without much trouble. I don't have time right now, but if you want you can submit a PR with some changes :)
I tried gradient accumulate and I can't detect any box after train the model 50 epochs. Obviously, accumulate gradients on batch size 16 is different with a train on batch size 64
I remember having done it with another repository, and beside batch norm I didn't saw any problems (there was a little performance drop, but nothing like no detection at all). Have you observe anything strange ? (Printing losses/histograms)
Batch1 gradient=1 Batch2 gradient=-1 After accumulate gradient=0 Is it correct?
Yes, it is supposed to be exactly the same as "intra-batch". But I'm sure the"theorical" part is good, Just need to check computational. For example running 8 batch 1 and one batch 8 is suppose to give the same gradient (with non-training batch norm if any)
So, it's impossible.
I tested it on my current project and it's working fine (I consider equal when it's close enough, for me at 1e-6). I may test it on efficientDet on free time but do you see what could be wrong ?
Hi, I saw there is a implementation of gradient checkpointing for TF1 code. do you have a plan to support it for tf2/keras, i think this is a useful feature. BTW, it's great if you also support Accumulate gradient for this repo :D.