Is the number of test samples 1000?

CharlesJames13586 commented 2 years ago

Hello, may I ask whether the accuracy of robustness(accuracy under PGD-50-10 attack) in your paper is the accuracy under 1000 test samples? Because when I replicated your code, I found that my accuracy in all sample sets(10,000) was only 36.16%(FGSM+RS+GradAlign-at), but the accuracy in the paper was 47.58%(FGSM+GradAlign-AT), and the test dateset was CIFAR-10. I'm worried because of some of my training hyperparameter settings.

max-andr commented 2 years ago

Hi,

Thanks for the interest in our paper. Yes, we report adversarial accuracy only on 1000 test samples, as mentioned in the appendix: "We perform evaluation of standard accuracy using full test sets, but we evaluate adversarial accuracy using 1,000 random points on each dataset."

I found that my accuracy in all sample sets(10,000) was only 36.16%(FGSM+RS+GradAlign-at), but the accuracy in the paper was 47.58%(FGSM+GradAlign-AT)

I think 36.16% is indeed too low. Usually, the difference between the adversarial accuracy on 1k and 10k samples varies at most by 2%. 11% sounds like a too large difference that might be explained by an unsuitable choice of the training hyperparameters. In particular, I'd suggest to check that:

The lambda of GradAlign is appropriate (one can do a grid search from scratch, although the value used in the paper should be a good choice already)
Checking that the step size alpha of FGSM is not too small. In all our experiments we used no random step and alpha:=eps. If a random step is used to initialize FGSM, then it probably would make sense to set alpha:=2*eps to make sure that you can reach, e.g. -eps even if your random initialization was close to eps.

I hope that helps.

Best, Maksym

CharlesJames13586 commented 2 years ago

Thank you very much for your reply. Your work has also given me some inspiration. I will try your suggestions and get back to you when I am finished.

CharlesJames13586 commented 2 years ago

I run FGSM+RS+GradAlign-AT, and the fgsm_alpha is setted 2.0, and it works. (for 10,000 test examples, the best is 42.11% , the last is 37.36%) Thank you very much.

max-andr commented 2 years ago

Great to hear that now the numbers are more reasonable!

By the way, when evaluated with AutoAttack, we got 43.93% robust accuracy for models trained with FGSM+GradAlign (reported here). But, of course, the exact numbers depend a lot on the hyperparameters and training schedule that may be different in your case.

I think I can close this issue. Feel free to reopen if you have further questions!

CharlesJames13586 commented 2 years ago

When I set lambda to 20, I got 43.30% accuracy in last model, in the FGSM+GradAlign experment.

tml-epfl / understanding-fast-adv-training

Is the number of test samples 1000? #7