Gradient Checkpointing and Accumulate gradient for TF2 ?

google / automl

Google Brain AutoML

Apache License 2.0

6.22k stars 1.45k forks source link

Gradient Checkpointing and Accumulate gradient for TF2 ? #815

Open dathudeptrai opened 4 years ago

dathudeptrai commented 4 years ago

Hi, I saw there is a implementation of gradient checkpointing for TF1 code. do you have a plan to support it for tf2/keras, i think this is a useful feature. BTW, it's great if you also support Accumulate gradient for this repo :D.

jcburnel commented 4 years ago

You may want to check some implementation like : https://www.kaggle.com/kentaronakanishi/tf2-0-way-to-accumulate-gradients-in-custom-loop#Model

Using the keras training adding accumulation can be done without much trouble. I don't have time right now, but if you want you can submit a PR with some changes :)

fsx950223 commented 3 years ago

I tried gradient accumulate and I can't detect any box after train the model 50 epochs. Obviously, accumulate gradients on batch size 16 is different with a train on batch size 64

jcburnel commented 3 years ago

I remember having done it with another repository, and beside batch norm I didn't saw any problems (there was a little performance drop, but nothing like no detection at all). Have you observe anything strange ? (Printing losses/histograms)

fsx950223 commented 3 years ago

Batch1 gradient=1 Batch2 gradient=-1 After accumulate gradient=0 Is it correct?

jcburnel commented 3 years ago

Yes, it is supposed to be exactly the same as "intra-batch". But I'm sure the"theorical" part is good, Just need to check computational. For example running 8 batch 1 and one batch 8 is suppose to give the same gradient (with non-training batch norm if any)

fsx950223 commented 3 years ago

So, it's impossible.

jcburnel commented 3 years ago

I tested it on my current project and it's working fine (I consider equal when it's close enough, for me at 1e-6). I may test it on efficientDet on free time but do you see what could be wrong ?