Some questions about the robustness under other attacks

THUYimingLi commented 4 years ago

Hi, thanks for your code and idea. The results are very surprising and appealing.

I adopted your techniques (cycle LR and FGSM with random initialization) in my method (not AT but very similar to AT), and it worked very well when the attack is 'FGSM-type', including FGSM, PGD, and MI-FGSM. However, the adversarial robustness degrades shapely compared with the corresponding one solved with PGD when I evaluate the model under other types of attacks (e.g., CW and JSMA) on the MNIST dataset. Have you tried those attacks in your evaluation? Have you met the same problem?

Thanks for your work again and looking for your reply.

Yiming Li

riceric22 commented 4 years ago

Hey Yiming!

One of the reviewers asked us this during the ICLR reviewing process, and we did end up evaluating the FGSM-trained model with the L-infinity variation of the CW attack. However, the L-infinity CW attack ended up being less effective than the PGD attack, and most work in the literature that I've seen does not observe a significant improvement from the CW attack over the PGD attack in the L-infinity setting (you can see the details in our response here).

The JSMA attack to my knowledge is typically not meant for crafting L-infinity bounded adversarial examples, and creates adversarial attacks which attack a small number of pixels (e.g. L0 threat model), and is outside of the L-infinity threat model. So if you used FGSM training (which defends against L-infinity adversarial examples), I wouldn't be surprised that the robustness is heavily degraded with a JSMA attack (which creates an L0 adversarial example).

I would double check that the threat model of the attacks you are using match the defense on which it was trained on; generally speaking, we don't usually expect adversarial defenses to generalize beyond that which they were defended on (e.g. different norms or larger radius). You may be using an L2 CW attack, which is more commonly used than the L-infinity variant, and doesn't match the FGSM threat model.

Of course it's entirely possible that you've already adapted both of these attacks to the L-infinity threat model and still saw this behavior. In which case, I can only say we haven't seen this sharp degradation in robustness for PGD and CW attacks. However, the sharp degradation may be indicative of catastrophic overfitting, and so it may be worth investigating the learning curves for your method.

~Eric

THUYimingLi commented 4 years ago

Thanks!

your response addresses my problems. As you mentioned, I did use C&W with the l_2 norm. I will use the one with the l_inf norm to evaluate our model again. Although this method does not seem to be generalized well in the type of attack, it still a very interesting work! I will close the issue since my problems are all solved.

Thanks again and look forward to your new works.

~Yiming

riceric22 commented 4 years ago

Just in case you wanted to defend against L2 attacks, someone brought it up in issue #1, where we found that the natural one step analog of the L2 PGD attack also seemed to empirically defend against full L2 PGD attacks (and the L2 CW attack).

Otherwise, the robustness of the single step method is restricted to the threat model used during training, which is the same restriction as full PGD-based adversarial training.

locuslab / fast_adversarial

Some questions about the robustness under other attacks #2