tensorflow / neural-structured-learning

Training neural models with structured signals.
https://www.tensorflow.org/neural_structured_learning
Apache License 2.0
980 stars 189 forks source link

Base Model and Adversarial Model have the same accuracy #122

Closed AbdAlRahman-Odeh-99 closed 1 year ago

AbdAlRahman-Odeh-99 commented 2 years ago

Hello, I have followed the tutorial of the adversarial training, but now I face an issue when testing the robustness of the models. In the "Robustness under Adversarial perturbations" section, both the base and adversarial model return the same accuracy (most of the time 0.500000). I don't know what the issue precisely is as I followed the tutorial. The task on which I am working is a binary classification of lung images. Please can you help me find out what is wrong? Thank you image

csferng commented 2 years ago

Hi @AbdAlRahman-Odeh-99 , thanks for your question!

Could you share some details on how the models are constructed, and the parameters used when compiling the model? Also, what is the accuracy on clean (non-perturbed) images?

AbdAlRahman-Odeh-99 commented 2 years ago

Hello and thanks for your response,,

csferng commented 2 years ago

Thanks, @AbdAlRahman-Odeh-99.

I am not sure the 0.500000 accuracy on the adversarial test set is due to a bad model or a bug in evaluation setting. The latter is somewhat suspicious because the accuracy number was exactly the same. May I know how many examples in the test set? Also, what is HPARAMS.batch_size and current_step after the code you pasted was run?

The base accuracy is around 98-99%

Is this accuracy on clean testing data (not adversarially perturbed)? Is this accuracy achieved by the base model or the adversarial-regularized model, or both?

It is a little surprising if an adversarial-regularized model has >95% accuracy on clean data, but only 50% on adversarial data (assuming the adversarial hyperparameters are the same). It is possible that the adversarial attack was too strong and successfully fooled the model. This can be checked by evaluating the model with a smaller step_size in AdvRegConfig when constructing the reference_model. (Also set a smaller epsilon if pgd_iterations>1.)

In addition, I have a question, is it possible for the base model to have the same robustness as the adversarially trained one?

I am not sure if I get your question correctly. Did you mean whether there is a scenario that a normally trained model performs as good as an adversarially trained one on adversarial data? Yes, it's possible. If the adversarial attack is too strong, say step_size=10 (where pixels are in [0, 1]), then probably all models are just making random guesses on adversarial data.

Or did you mean whether there is a way to improve robustness without adversarial training? This is an active area of research. On defending against natural distortions (blurring, Gaussian noise, etc), there are methods like data augmentation (e.g. AugMix) and contrastive learning (e.g. SimCLR). On defending against adversarial attacks, adversarial training is a pretty common technique (you may view this as a kind of data augmentation), but there is also a stream of research on "certified" adversarial robustness (e.g. smoothing and denoising).

csferng commented 1 year ago

Closing this issue due to inactivity in 30 days. Please feel free to reopen if you have more questions.