tml-epfl / understanding-fast-adv-training

Understanding and Improving Fast Adversarial Training [NeurIPS 2020]
https://arxiv.org/abs/2007.02617
94 stars 12 forks source link

Some detail about GradAlign #3

Closed my91porn closed 3 years ago

my91porn commented 3 years ago

Hi! This is a nice work, however, there is a point make me confused. In your readme, you say gradalign reg is following:

grad1 = utils.get_input_grad(model, X, y, opt, eps, half_prec, delta_init='none', backprop=True)
grad2 = utils.get_input_grad(model, X, y, opt, eps, half_prec, delta_init='random_uniform', backprop=True)
grad1, grad2 = grad1.reshape(len(grad1), -1), grad2.reshape(len(grad2), -1)
cos = torch.nn.functional.cosine_similarity(grad1, grad2, 1)
reg = grad_align_lambda * (1.0 - cos.mean())

where grad1 and grad2 are all record grad. However in your train.py , you used detach which will not record grad, I don't know which one is better? https://github.com/tml-epfl/understanding-fast-adv-training/blob/65fda9b02dc5e25b374a47b532cddd6e0829e4b0/train.py#L162

max-andr commented 3 years ago

Hi,

Great catch! Indeed, I missed this when I was writing the README. Now I updated it where backprop=True is set only for grad2, and I also added the following comment:

Note that we can use backprop=True on both gradients grad1 and grad2 but, based on our experiments, this doesn't make a substantial difference. Thus, to save computations, one can just use backprop=True on one of the two gradients.

So when we were writing the paper we indeed checked whether using backprop on both gradients has some influence and actually not much. We were getting nearly identical results but just the training was a bit slower. So we decided to go with backprop=True only for the second gradient, i.e. as shown in train.py. But I forgot this detail when I was writing the README. So thanks a lot for catching this!

Best, Maksym