Some detail about GradAlign

tml-epfl / understanding-fast-adv-training

Understanding and Improving Fast Adversarial Training [NeurIPS 2020]

94 stars 12 forks source link

grad1 = utils.get_input_grad(model, X, y, opt, eps, half_prec, delta_init='none', backprop=True) grad2 = utils.get_input_grad(model, X, y, opt, eps, half_prec, delta_init='random_uniform', backprop=True) grad1, grad2 = grad1.reshape(len(grad1), -1), grad2.reshape(len(grad2), -1) cos = torch.nn.functional.cosine_similarity(grad1, grad2, 1) reg = grad_align_lambda * (1.0 - cos.mean())

Hi,

Great catch! Indeed, I missed this when I was writing the README. Now I updated it where backprop=True is set only for grad2, and I also added the following comment:

Note that we can use backprop=True on both gradients grad1 and grad2 but, based on our experiments, this doesn't make a substantial difference. Thus, to save computations, one can just use backprop=True on one of the two gradients.

So when we were writing the paper we indeed checked whether using backprop on both gradients has some influence and actually not much. We were getting nearly identical results but just the training was a bit slower. So we decided to go with backprop=True only for the second gradient, i.e. as shown in train.py. But I forgot this detail when I was writing the README. So thanks a lot for catching this!

Best, Maksym

tml-epfl / understanding-fast-adv-training

Some detail about GradAlign #3