locuslab / fast_adversarial

[ICLR 2020] A repository for extremely fast adversarial training using FGSM
428 stars 91 forks source link

About PGD evaluation #10

Closed feather0011 closed 4 years ago

feather0011 commented 4 years ago

Hi, thank you for the great work and opening the code.

However I have a question about PGD evaluation.

In the code, when attack_pgd is called, it seems that for some images in a batch, adversarial perturbation is gained with less steps than attack_iter.

During the iteration, update on perturbation 'delta' are performed to the images those are classified correctly only. (index is the variable that indicate the images that are classified correctly and in delta, only delta[index[0]] is updated in the loop for _ in range(attack_iters):)

I understand that the Image that are classified correctly are not adversarial example, so more search in l-inf ball should be perform to seek adversarial perturbation.

However, I don't understand why the search should be stopped for the Images which are classified wrongly in the early step of PGD iteration.

I think it can be expected that more strong adversarial perturbation can be searched by performing more gradient descent iteration even if the images are adversarial already. In other word, I doubt that evaluation on PGD are performed with relatively weak adversarial examples.

These maybe the adversarial examples with less distant from original one(not exactly but approximately), but not strong adversarial examples. And I think the strength of adversarial example is crucial because the main claim of paper is that training with FGSM can build model that are robust to strong attack such as PGD.

I think that something like max_delta[all_loss >= max_loss] = delta.detach()[all_loss >= max_loss] in the loop for zz in range(restarts): should be performed in the loop for _ in range(attack_iters): to find the strongest adversarial example that can be achieved with attack_iter steps.

But of course, I may be missing something. So can you tell me the underlying idea about why the iteration stop when the image are classified wrongly while building PGD perturbation?

feather0011 commented 4 years ago

Sorry, I notice that attack_PGD is for evaluation only(not for PGD training), and in evaluation, if a test image is vulnerable with weak adversarial perturbation, the strong adversarial perturbation is not needed to be computed.