Closed psteinb closed 2 years ago
Hi, thank you for your support!
CIFAR-{10, 100}-C and ImageNet-C consist of 75 datasets (= data corrupted by 15 different types with 5 levels of intensity each). The robustness in this paper is the average of the accuracies on these 75 corrupted datasets.
In particular, I recommend that you measure the robustness as follows:
robustness.ipynb
to get predictive performances of a pretrained model on the 75 datasets. CIFAR-{10, 100}-C will be automatically downloaded. Then, you will get a performance sheet like the sample robustness sheet."Intensity", "Type", "NLL", "Cutoff1", "Cutoff2", "Acc", "Acc-90", "Unc", "Unc-90", "IoU", "IoU-90", "Freq", "Freq-90", "Top-5", "Brier", "ECE", "ECSE”
, respectively. We only use the accuracy column ("Acc"
).To avoid confusion: rigorously, we do not use the following types of datasets for evaluation: "speckle_noise", "gaussian_blur", "spatter", "saturate"
. Another metric called mCE (which does not used in this paper) is also used for robustness.
The batch size is 256 by default, but I believe the robustness is independent of the batch size.
Closing this issue based on the comment above. Please feel free to reopen this issue if the problem still exists.
Sure thing, please close the issue.
I think it would be great to have access to the intermediate results to (re-)produce the robustness numbers.
I fancied in the robustness notebook that I'd have to retrain all cited models (as I cannot honor models.load(name, ...)
in my environment) and (to be honest) didn't want to invest the CO2 for this.
But maybe the .pth
checkpoints are available for download and I misread the docs. Please accept my apologies if that is the case.
Thank you for your constructive feedback. I agree with your comments that releasing intermediate results would be helpful, because evaluating pretrained models on 75 datasets can be resource intensive. I will release robustness sheets as intermediate results for some models, and make the pretrained models easily accessible.
Hi,
thank you for this wonderful work on vision transformers and how to understand them. I have some simple questions which I must apologize for. I tried to reproduce figure 12 independently of your code base. I struggle a bit to understand the code. Is is correct that you define robustness as
robustness = mean(accuracy(y_val_true, y_val_pred))
? Related to this, do I understand correctly that you compute this accuracy on batches of the validation dataset? These batches are of size256
, right?Thanks.