Closed JonasGeiping closed 1 year ago
To add additional information, there are actually two variants of ResNet: one for ImageNet's 224x224 images and one for CIFAR-10's 32x32 image size. In the original ResNet paper (https://arxiv.org/abs/1512.03385), there is a discussion in Section 4.2 about the differences in architecture for CIFAR-10 due to the 32x32 image size. The ResNet models in torchvision.models
use the ImageNet variant.
Upon investigation, I've noticed that the CIFAR-10 ResNet implementation in this repo: https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py is the most commonly used by others. So this may also be a potential solution.
Comparing the two ResNet18 architectures (torchvision.models.resnet18
vs the GitHub repo), here's a diff check of the PyTorch model print-outs: https://www.diffchecker.com/lz9Du86y
The left is the ImageNet variant and the right is the CIFAR-10 variant. I renamed/cleaned up things so only the real differences are shown.
Based on the diff check, the only differences are indeed the conv1
and maxpool
in the beginning.
What about the avgpool
removal at the end?
The model still needs to have an avgpool
layer. In the repo compared above the avg pool is not a module, so it doesn't show up in the model definition.
Correct, both implementations have an avgpool
. One just does a layer while the other does a functional call. This is the same for the relu
layers. I removed all the ReLUs and AvgPool from the diff check
I have only run a handful of comparison tests, but with the cifar specific model, I'm getting much better benign accuracy AND much better attack success.
I'm getting a bump of 0.75 to 0.85 classification accuracy (note, no data augmentation done here). The attack success for both DLBD and Sleeper Agent increased. DLBD, 10% poisoned: attack success went from 0.244 to 0.945 attack success Sleeper Agent, 10% poisoned with spectral signatures defense went from 0 to 0.61 attack success Sleeper Agent, 1% poisoned with no defense went from 0.486 to 0.814
These are very significant differences, so I'm going to merge in the cifar model update, then do a large scale run of the poisoning scenarios to generate updated baseline numbers.
Description of the bug
The model architectures used for evaluation in eval 5 and eval 6 for poisoning with CIFAR-10 are not optimal. If correct, then this might be the reason why these models perform less well than expected in benign and robust accuracy.
Configurations in
https://github.com/twosixlabs/armory/tree/master/scenario_configs/eval5/poisoning/baseline_defenses/cifar10
, for examplehttps://github.com/twosixlabs/armory/blob/master/scenario_configs/eval5/poisoning/cifar10_witches_brew.json
describe the model architecture asarmory.baseline_models.pytorch.resnet18
with input shape[32,32,3]
. However, these ResNet18s are defined inhttps://github.com/twosixlabs/armory/blob/master/armory/baseline_models/pytorch/resnet18.py
and draw directly from the pytorch ImageNet implementations for ResNet18 models.The PyTorch ResNets are not well-suited to inputs of shape [32,32,3] as they contain a downsampling stem (with a 7x7 convolution of stride 2, and a 3x3 maxpool). A better stem would be to have a single 3x3 convolution without stride instead. This type of CIFAR-stem is common in literature when using ResNet-18 models on CIFAR-10 and should reduce the error rate for classification from 25% to about 5%. It is possible that the stem is modified somewhere else in the code (in which case this issue is wrong), but I could not find it in the code.
This issue also applies to the models described for eval6.
Steps To Reproduce
Refer to
armory.baseline_models.pytorch.resnet18
.Additional Information
This issue could be fixed by loading the ResNet model as done currently, but replacing the
.conv1
module with a 3x3 convolution without stride, and the.maxpool
module with atorch.nn.Identity
for model configured for inputs with shape [32,32,3].