xingjunm / lid_adversarial_subspace_detection

Code for paper "Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality".
MIT License
122 stars 38 forks source link

cw-l2 accuracy 90% with resnet. #2

Closed rajanieprabha closed 6 years ago

rajanieprabha commented 6 years ago

Hello,

In your code, I successfully created adversarial examples on CIFAR10 dataset using cw-l2 attack method and as per your code, it seems to work. I used the new adversarial examples created with the CIFAR10 test set labels but on a model trained on resnet and the accuracy came out to be 90%. If the adversarial examples are so strong then why is the accuracy close to the accuracy on normal data? I am clueless about the reasoning. Do you have any insight regarding this? Thanks!

xingjunm commented 6 years ago

Hi, the question is not clear to me: by 90% do you mean 'the detection accuracy on resnet crafted advs' or 'prediction accuracy on resnet advs by the non-resnet model? If you are using the cw-l2 attack, what's attack success rate on the resnet, and cw-l2 can only attack the logits layer (make sure you are attacking the right layer).

rajanieprabha commented 6 years ago

90% of the times, the resnet model makes correct predictions, i.e. even after the added perturbations via cw-l2 attack, the resnet model still classifies them correctly which shouldn't be the case ideally. Basically 'prediction accuracy on resnet advs by the non-resnet model' this. I used your code to create adversarial examples on CIFAR10 test set(model of your code). All the examples created had attack success rate of 1. I saved those examples as npy file. I trained another resnet model isolated from your code and loaded the saved adverarial npy file and check the accuracy which came out to be 90%. Maybe I'm doing something wrong in this process. Any idea? Thanks!

xingjunm commented 6 years ago

I see. It's about the transferability of the attacks. First, try cw-l2 but with the parameter confidence=40 (default is 0), this will generate "strong" attacks -- higher transferability to attack other DNNs. Second, as your target model is a ResNet, which I assume is more complex than my non-resnet model defined here. This also affects transferability of attacks. Generally, attacks generated based on "complex" models easily transfer to attack "simple" models, not the other way around. There are some papers on the "transferability" topic on attacks. This is a two-sided game: attack <-> defense. If you work on attack, then you need to increase transferability of your new attacks, if you work on defense, your defense model should resistant to attacks that may be generated based on different and may be more complex DNNs. Hope this clarifies something. Hope this clarifies something. :)

rajanieprabha commented 6 years ago

Makes sense. I will check them out. Thanks a lot! :)

xingjunm commented 6 years ago

No worries. :)

rajanieprabha commented 6 years ago

Hi,

Thank you for your support all the while. I have one small question if you have any insight, it would be great for my understanding. Why does your code removes the softmax layer when crafting adversarial examples and hence it doesn't use the softmax layer even during the detection? I am curious about the fact that why the examples behave otherwise if evaluated with softmax. Softmax is just a kind of normalization, right?

Thank you for giving me your kind attention!

Regards, Rajanie

On Sun, Jun 3, 2018 at 7:21 PM, Xingjun Ma notifications@github.com wrote:

No worries. :)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/xingjunm/lid_adversarial_subspace_detection/issues/2#issuecomment-394177187, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9TFBgD2Eaoc3pMxg6ODuIf2BwatVQWks5t5BsUgaJpZM4UYH9p .

xingjunm commented 6 years ago

Hi, Yes, softmax is normalization. However, SOME attacks only work with logits (before softmax) to ensure the attacking problem be a solvable straightforward optimization problem (this is how they are designed, and maybe a weakness). For example, cw-l2 attack would fail with high rate if attacking the softmax layer (the deeper reason is the gradient problem). However, in my code, the LID computation actually includes the softmax. You may want to try empirically, see what will happen to attack softmax. :)

rajanieprabha commented 6 years ago

Thank you so much! Much appreciated!

On Tue, Jun 5, 2018 at 4:18 PM, Xingjun Ma notifications@github.com wrote:

Hi, Yes, softmax is normalization. However, SOME attacks only work with logits (before softmax) to ensure the attacking problem be a solvable straightforward optimization problem (this is how they are designed, and maybe a weakness). For example, cw-l2 attack would fail with high rate if attacking the softmax layer (the deeper reason is the gradient problem). However, in my code, the LID computation actually includes the softmax. You may want to try empirically, see what will happen to attack softmax. :)

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/xingjunm/lid_adversarial_subspace_detection/issues/2#issuecomment-394726665, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ9TFNUoUlF25pCO1nn-FNvuzidbl9Twks5t5pMggaJpZM4UYH9p .