Gradient accumulation - Githubissues

sokrypton / AccAdam_TF2

TF2-compatible Accumulated Gradients for Adam

1 stars 1 forks source link

Gradient accumulation #1

Open andreped opened 2 years ago

andreped commented 2 years ago

Just wanted to let you know that I have made a more generic implementation for GA, which wraps around the entire model, without having to modify the optimizer itself. Very simple concept and easy to implement.

See here: https://github.com/andreped/GradientAccumulator

There was also an attempt to make a wrapper around the optimizer, but it seemed to not be working as intended.

There is also an active Issue on this exact topic on the Keras repo, where devs are currently working on adding API-support for GA: https://github.com/keras-team/tf-keras/issues/107

andreped commented 2 years ago

Also, if you prefer wrapping the optimizer, instead of overloading the train_step, you could try this implementation which in theory should work for all optimizers: https://github.com/andreped/GradientAccumulator/blob/main/GradientAccumulator/accumulator.py#L7

However, when I run some benchmarks, I noticed that I did not get expected results, compared to regular batch training, so something is wrong with it.

For one, I believe the SUM reduction is wrong, it should be MEAN reduction. Might be part of the solution.