About performing adversarial training with advprop

tingxueronghua / pytorch-classification-advprop

MIT License

105 stars 16 forks source link

About performing adversarial training with advprop #7

Closed YinghuaGao closed 2 years ago

YinghuaGao commented 2 years ago

Hi, Yucheng,

Thanks for your nice implementations of advprop. I have a question about performing adversarial training with advprop. In standard training, the clean data pass the main BNs and the generated adversarial data pass the auxiliary BNs. However, if we perform adversarial training, only the adversarial data will be used. So how can we perform adversarial training with advprop?

tingxueronghua commented 2 years ago

I think it is quite hard to utilize only adversarial examples, to improve clean accuracy, which means the model does not know what is "normal" images. I think the model could only understand what the data it has ever seen, unless people give some prior knowledge, which is hard for adversarial examples.
Anyway, I didn't continue to follow up this work, and the understanding above might not be correct. If you have questions, we still could discuss.

YinghuaGao commented 2 years ago

Thanks for your response. Maybe I did not make my questions clear. I notice the results reported in advprop paper are mainly about clean accuracy. If we want to evaluate the robust accuracy of advprop. How should we generate adversarial examples with the trained model in a PGD style? When we intend to generate the adversarial example of a clean test input, should we use the auxiliary BNs rather than the main BNs in the forward process? And if we regard advprop as a standard training strategy, what is the adversarial training variant of advprop? Looking forward to your advice.

tingxueronghua commented 2 years ago

I think it is better to use only clean inputs and main BNs to generate adversarial examples. The auxiliary BNs are not used during inference, and that's why I think they should not be used for generating adversarial examples.
I am not sure about the results, but I think it is not reasonable to generating the adversarial example of a clean test input with auxiliary BNs. In my opinion, adversarial attack relies on the gradients, which might be not accurate with different statistics in auxiliary BNs.
I think simply throwing away the auxiliary BNs, and attacking the main BNs with clean inputs is okay. I didn't check the results before, and you could have a look at the original papers or do some experiments to see the results.

tingxueronghua commented 2 years ago

If there is any other questions, feel free to ask it.