Base Model and Adversarial Model have the same accuracy

AbdAlRahman-Odeh-99 commented 2 years ago

Hello, I have followed the tutorial of the adversarial training, but now I face an issue when testing the robustness of the models. In the "Robustness under Adversarial perturbations" section, both the base and adversarial model return the same accuracy (most of the time 0.500000). I don't know what the issue precisely is as I followed the tutorial. The task on which I am working is a binary classification of lung images. Please can you help me find out what is wrong? Thank you

csferng commented 2 years ago

Hi @AbdAlRahman-Odeh-99 , thanks for your question!

Could you share some details on how the models are constructed, and the parameters used when compiling the model? Also, what is the accuracy on clean (non-perturbed) images?

AbdAlRahman-Odeh-99 commented 2 years ago

Hello and thanks for your response,,

The models are pre-trained models (vgg16, renset50, densenet121 and inceptionv3).
I used them in a previous project and they work well. The base accuracy is around 98-99%
The parameters are step_size of 0.2, multiplier 0.2, and L2 regularizer
In addition, I have a question, is it possible for the base model to have the same robustness as the adversarially trained one?

csferng commented 2 years ago

Thanks, @AbdAlRahman-Odeh-99.

I am not sure the 0.500000 accuracy on the adversarial test set is due to a bad model or a bug in evaluation setting. The latter is somewhat suspicious because the accuracy number was exactly the same. May I know how many examples in the test set? Also, what is HPARAMS.batch_size and current_step after the code you pasted was run?

The base accuracy is around 98-99%

Is this accuracy on clean testing data (not adversarially perturbed)? Is this accuracy achieved by the base model or the adversarial-regularized model, or both?

It is a little surprising if an adversarial-regularized model has >95% accuracy on clean data, but only 50% on adversarial data (assuming the adversarial hyperparameters are the same). It is possible that the adversarial attack was too strong and successfully fooled the model. This can be checked by evaluating the model with a smaller step_size in AdvRegConfig when constructing the reference_model. (Also set a smaller epsilon if pgd_iterations>1.)

When step_size=0, which means no perturbation, the accuracy should be the same as the clean accuracy. If the accuracy is still low, then there is likely a bug in evaluation setting (test data, metrics, etc).
If the accuracy is high when step_size=0 but gradually decreases when step_size increases, that means the evaluation setting is probably fine and the model is not strong enough to defend the adversarial attack.
- In this case, maybe plot a few examples perturbed at different step_size to determine a suitable threshold.
- Maybe train a model with a larger multiplier in AdvRegConfig, so that it pays more attention to adversarial examples.
- Also maybe consider a training strategy which starts with a small step_size but gradually enlarge it along training. (Similar to learning rate decay, but in an opposite direction.) This may help the model adapt to stronger attacks.

In addition, I have a question, is it possible for the base model to have the same robustness as the adversarially trained one?

I am not sure if I get your question correctly. Did you mean whether there is a scenario that a normally trained model performs as good as an adversarially trained one on adversarial data? Yes, it's possible. If the adversarial attack is too strong, say step_size=10 (where pixels are in [0, 1]), then probably all models are just making random guesses on adversarial data.

Or did you mean whether there is a way to improve robustness without adversarial training? This is an active area of research. On defending against natural distortions (blurring, Gaussian noise, etc), there are methods like data augmentation (e.g. AugMix) and contrastive learning (e.g. SimCLR). On defending against adversarial attacks, adversarial training is a pretty common technique (you may view this as a kind of data augmentation), but there is also a stream of research on "certified" adversarial robustness (e.g. smoothing and denoising).

csferng commented 1 year ago

Closing this issue due to inactivity in 30 days. Please feel free to reopen if you have more questions.

tensorflow / neural-structured-learning

Base Model and Adversarial Model have the same accuracy #122