How do you generate adversarial examples?

gabbaha commented 5 years ago

Hi,

Thanks for you nice work in CVPR 2019. It's really interesting and provides strong results.

However, I found the adversarial example generation process is not clearly described in both paper and the released code. I'm really curious about the following question: In the considered threat model, does the attack know you use ComDefend? In other words, when generating BIM and CW adv examples, do you attack the full end-to-end differentiable system (ComCNN+RecCNN+classifier), or just attack the pre-trained image classifier? Answer to this question would lead to two subtle different possibilities:

If attacking the full system only decreases the accuracy to about 40% on ImageNet (l-infinity, eps=16) as reported in the paper, then the proposed system basically obtains the best white-box adversarial robustness (or slightly worse than [1]) to my best knowledge.
If only the image classifier is attacked during evaluation, then there is actually no empirical results of ComDefend on white-box attacks. Carlini et.al [2,3,4] already breaks many defense methods which rely on the gradient masking effects. These methods seem to be robust on the black-box setting, but got 0% acc after appropriate BPDA attack under the white-box setting.

[1] Xie, Cihang, et al. "Feature denoising for improving adversarial robustness." CVPR, 2019 [2] Athalye, Anish, Nicholas Carlini, and David Wagner. "Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples." ICML, 2018 [3] Carlini, Nicholas. "Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?." arXiv preprint arXiv:1902.02322 (2019). [4] Carlini, Nicholas, and David Wagner. "Defensive distillation is not robust to adversarial examples." arXiv preprint arXiv:1607.04311 (2016).

jiaxiaojunQAQ commented 5 years ago

Only just attack the pre-trained image classifier. Because when we attack the full end-to-end system (ComCNN+RecCNN+classifier), we can not get the gradient of the system. There will be a gradient explosion phenomenon. We tried to use the gradient replacement method to attack, but found that the attack effect is not ideal.

vexilligera commented 5 years ago

Thanks for the contribution. As for the BIM evaluation, how many iterative steps did you run to optimize the adversarial images? I found the hyper-parameters missing in your paper.

jiaxiaojunQAQ commented 5 years ago

We use the foolbox to achieve the BIM attack method. And the iterations=10. You can refer to the URL(https://foolbox.readthedocs.io/en/latest/modules/attacks/gradient.html#foolbox.attacks.BIM)

vexilligera commented 5 years ago

We use the foolbox to achieve the BIM attack method. And the iterations=10. You can refer to the URL(https://foolbox.readthedocs.io/en/latest/modules/attacks/gradient.html#foolbox.attacks.BIM)

Thanks for the reply. I'm also wondering did you perform more steps than 10 (say 100 or 1000) and how was the performance?

jiaxiaojunQAQ commented 5 years ago

When we use the 10 steps to attack the welled-trained classifier, it can achieve a percent attack success rate. In my opinion, the 10 steps can get the perceptible noises. And the more steps attack method is not meaningful, because there is already significant noise between the generated image and the original image.

vexilligera commented 5 years ago

When we use the 10 steps to attack the welled-trained classifier, it can achieve a percent attack success rate. In my opinion, the 10 steps can get the perceptible noises. And the more steps attack method is not meaningful, because there is already significant noise between the generated image and the original image.

I'm afraid that might not be the case. They performed up to 2000 steps in [1], while 10-step accuracy and 100-step accuracy are very different. Perhaps more iterations are needed to evaluate these new defense techniques. [1] Xie, Cihang, et al. "Feature denoising for improving adversarial robustness." CVPR, 2019

jyuan1118 commented 5 years ago

Hi,

Thanks for you nice work in CVPR 2019. It's really interesting and provides strong results.

However, I found the adversarial example generation process is not clearly described in both paper and the released code. I'm really curious about the following question: In the considered threat model, does the attack know you use ComDefend? In other words, when generating BIM and CW adv examples, do you attack the full end-to-end differentiable system (ComCNN+RecCNN+classifier), or just attack the pre-trained image classifier? Answer to this question would lead to two subtle different possibilities:

If attacking the full system only decreases the accuracy to about 40% on ImageNet (l-infinity, eps=16) as reported in the paper, then the proposed system basically obtains the best white-box adversarial robustness (or slightly worse than [1]) to my best knowledge.

If only the image classifier is attacked during evaluation, then there is actually no empirical results of ComDefend on white-box attacks. Carlini et.al [2,3,4] already breaks many defense methods which rely on the gradient masking effects. These methods seem to be robust on the black-box setting, but got 0% acc after appropriate BPDA attack under the white-box setting.

[1] Xie, Cihang, et al. "Feature denoising for improving adversarial robustness." CVPR, 2019 [2] Athalye, Anish, Nicholas Carlini, and David Wagner. "Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples." ICML, 2018 [3] Carlini, Nicholas. "Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?." arXiv preprint arXiv:1902.02322 (2019). [4] Carlini, Nicholas, and David Wagner. "Defensive distillation is not robust to adversarial examples." arXiv preprint arXiv:1607.04311 (2016).

If attacking the full system, I think the results will be completely different, especially by PGD attack.

jiaxiaojunQAQ commented 5 years ago

Hi, Thanks for you nice work in CVPR 2019. It's really interesting and provides strong results. However, I found the adversarial example generation process is not clearly described in both paper and the released code. I'm really curious about the following question: In the considered threat model, does the attack know you use ComDefend? In other words, when generating BIM and CW adv examples, do you attack the full end-to-end differentiable system (ComCNN+RecCNN+classifier), or just attack the pre-trained image classifier? Answer to this question would lead to two subtle different possibilities:

If attacking the full system only decreases the accuracy to about 40% on ImageNet (l-infinity, eps=16) as reported in the paper, then the proposed system basically obtains the best white-box adversarial robustness (or slightly worse than [1]) to my best knowledge.

If only the image classifier is attacked during evaluation, then there is actually no empirical results of ComDefend on white-box attacks. Carlini et.al [2,3,4] already breaks many defense methods which rely on the gradient masking effects. These methods seem to be robust on the black-box setting, but got 0% acc after appropriate BPDA attack under the white-box setting.

[1] Xie, Cihang, et al. "Feature denoising for improving adversarial robustness." CVPR, 2019 [2] Athalye, Anish, Nicholas Carlini, and David Wagner. "Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples." ICML, 2018 [3] Carlini, Nicholas. "Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples?." arXiv preprint arXiv:1902.02322 (2019). [4] Carlini, Nicholas, and David Wagner. "Defensive distillation is not robust to adversarial examples." arXiv preprint arXiv:1607.04311 (2016).

If attacking the full system, I think the results will be completely different, especially by PGD attack.

As I have said when we attack the full end-to-end system, there would be a gradient explosion phenomenon. In my opinion, the essence of compression is to reduce the size of the counter-subspace. Specifically, it will offset the direction of the adversarial sample toward the clean sample. Of course, there are also adversarial examples in the compression space, so you can definitely find the adversarial examples with PGD. However, the adversarial examples at this time may have a large gap with the clean image, such as the noise is obvious.

jiaxiaojunQAQ / Comdefend

How do you generate adversarial examples? #1