Detection as a defense - Githubissues

The idea is to differentiate BS and AE based on their output from distinct transform models.

In Detecing adversarial samples from artifacts.pdf, it is shown that different models make different mistakes when presented to same AEs. And 2018-arXiv-PictureAE-Picture_AE_detection_bimodel.pdf proposes Bi-model approach that concatenates the output of an image from two distinct models as its feature representation and then feeds it to a binary classifier for classification. The approach is claimed to reach >90% detection accuracy on mnist and cifar10.

we can concatenate/stack up the output of transform models for an input image and use it as a representation of the image and feed into a binary classifier. This might could have a higher detection accuracy and generalize better across different type of attacks.

Investigation:

Identify patterns of the prediction output of BS and AE for BS and each type of AE, plot the boxplot for the average, min and max accuracy of all transform models
Detection approach 1: majority voting the prediction output of AEs is much more diverse than that of BS. That is, the number of transform models agrees with each other on AE will be much smaller than that on BS. If this number is below some threshold, say 75% X total_number_of_models, the input image will be marked as an AE. Otherwise, it is considered as a benign sample.
Detection approach 2: distance matrix 【empirical evidence to collection】: distance of prediction outputs of a benign sample from two distinct transform models is close to 0, while the distance of prediction outputs of an AE from two distinct transform models should be much larger than 0. Try with different distance metrics: L2, entropy, KL divergence, cosine, correlation

【distance matrix】: for an image, create a distance matrix by computing the distances of its prediction outputs between each pair of transform models. Investigate any possible property or difference between AE and BS

softsys4ai / athena

Detection as a defense #11