nmndeep / revisiting-at

[NeurIPS 2023] Code for the paper "Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models"
37 stars 3 forks source link

Different results between Table 1 and 2 #1

Closed youzunzhi closed 1 year ago

youzunzhi commented 1 year ago

Thank you for the great work. I am just wondering if you can explain the reason behind the difference between Table 1 and 2. For example, in Table 1, the ViT-S' performance is (60.3, 30.4), while in Table 2 its performance is (61.5, 31.8) where random init and basic augmentation are adopted. I think the difference is that model in Table 1 are pretrained with standard training for 100 epochs while Table 2 use rand init. But if that's the case, why Table 1 is worse than Table 2? Thank you very much!

nmndeep commented 1 year ago

Hi, Thanks for your interest in our work. We also noticed this 'outlier' stat. This is an optimization artefact most likely - and we believe that when no strong regularization is present (like heavy augmentations) ViTs behaviour is a bit hard to estimate - this is already improved upon by adding the ConvStem (can be seen from the same comparison). In this particular instance, it is true that linf numbers are slightly worse in Table 1 but unseen threat model numbers are better (this is likely due to no strong augmentation/regularization). All in all we think in low epoch regime as well ViTs require features learned with stronger regularization even if you initialize the model from a slightly better point.

youzunzhi commented 1 year ago

That makes sense to me. Thank you so much for your response!