How come Gradient Sum and EG do two gradient steps?

laura-rieger / deep-explanation-penalization

Code for using CDEP from the paper "Interpretations are useful: penalizing explanations to align neural networks with prior knowledge" https://arxiv.org/abs/1909.13584

MIT License

127 stars 14 forks source link

Hi! Thanks for sharing this repo!

In the MNIST Decoy code, for method 1 and 2 (gradient_sum and eg), there are two gradient steps per batch. The first step uses gradients from just explanation_penalty, and the second step uses the gradients from both the explanation_penalty and the log loss. I was wondering what the reason for this was?

Reference in code: https://github.com/laura-rieger/deep-explanation-penalization/blob/8249af7fecf92c2b93dc2e39baf4cfd1423b53b4/mnist/DecoyMNIST/train_mnist_decoy.py#L168

Thanks!

laura-rieger / deep-explanation-penalization

How come Gradient Sum and EG do two gradient steps? #10