About PGD evaluation - Githubissues

Hi, thank you for the great work and opening the code.

However I have a question about PGD evaluation.

In the code, when attack_pgd is called, it seems that for some images in a batch, adversarial perturbation is gained with less steps than attack_iter.

During the iteration, update on perturbation 'delta' are performed to the images those are classified correctly only. (index is the variable that indicate the images that are classified correctly and in delta, only delta[index[0]] is updated in the loop for _ in range(attack_iters):)

I understand that the Image that are classified correctly are not adversarial example, so more search in l-inf ball should be perform to seek adversarial perturbation.

However, I don't understand why the search should be stopped for the Images which are classified wrongly in the early step of PGD iteration.

I think it can be expected that more strong adversarial perturbation can be searched by performing more gradient descent iteration even if the images are adversarial already. In other word, I doubt that evaluation on PGD are performed with relatively weak adversarial examples.

These maybe the adversarial examples with less distant from original one(not exactly but approximately), but not strong adversarial examples. And I think the strength of adversarial example is crucial because the main claim of paper is that training with FGSM can build model that are robust to strong attack such as PGD.

I think that something like max_delta[all_loss >= max_loss] = delta.detach()[all_loss >= max_loss] in the loop for zz in range(restarts): should be performed in the loop for _ in range(attack_iters): to find the strongest adversarial example that can be achieved with attack_iter steps.

But of course, I may be missing something. So can you tell me the underlying idea about why the iteration stop when the image are classified wrongly while building PGD perturbation?

locuslab / fast_adversarial

About PGD evaluation #10