Validation on adversarial samples for OOD detection

pokaxpoka / deep_Mahalanobis_detector

Code for the paper "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks".

340 stars 78 forks source link

In the paper you mention you validate the hyperparameters for the input processing (the FGSM magnitude) and the feature ensemble using adversarial samples (right part of Table 2 in the paper). I think this validation makes more sense than validation using OOD samples, since as you say these samples are often inaccessible a priori.

I cannot seem to find the part in the code for this validation, and was just wondering specifically how you validate the FGSM magnitude $\epsilon$ when you use the adversarial samples, since the in-distribution samples will also be preprocessed with FGSM in the same way as the adversarial samples, correct? Then I guess the only difference between in-dist and adv samples is that the adv samples are processed with one extra FGSM optimization step?

If you could clarify or point me to the code section, that would be great.

BTW nice work!

I also have some questions about this experiment ("Comparison of robustness" part of Section 3.1):

At this point should we assume that the M(x) models have already been "trained" in that we have already computed $\mu_c$ and $\Sigma$ for each layer using only Cifar10-Train-Clean data?
When training the feature ensemble weights do you use Cifar10-Test-Clean as the positive samples and Cifar10-Test-FGSM as the negative samples? Or, do you train the ensemble weights using Cifar10-Train-Clean and Cifar10-Train-FGSM data?
What epsilon do you use for FGSM step?
How critical is the input preprocessing step to this method? Is the performance of the feature ensemble still pretty good when we do validation on FGSM samples but do not do the input preprocessing step?

Thanks in advance.

pokaxpoka / deep_Mahalanobis_detector

Validation on adversarial samples for OOD detection #1