Open oceank opened 5 years ago
Based on the initial experiment (check the result detail in the commit d0190cbaba6839e5b8fe41e6c0e21c615a7d7db8), detection approach 1 together with the majority-voting based classification (recovery) could provide a very impressive success rate of either correctly detecting an AE as AE or classify the input image to its true label no matter if it is a benign sample or AE.
The hypothesis about the distance matrix (using L2 norm) is confirmed. For different types of AEs, rich pattern information shows up in the heatmap of their distance matrix. More investigation is encouraged.
The idea is to differentiate BS and AE based on their output from distinct transform models.
In Detecing adversarial samples from artifacts.pdf, it is shown that different models make different mistakes when presented to same AEs. And 2018-arXiv-PictureAE-Picture_AE_detection_bimodel.pdf proposes Bi-model approach that concatenates the output of an image from two distinct models as its feature representation and then feeds it to a binary classifier for classification. The approach is claimed to reach >90% detection accuracy on mnist and cifar10.
we can concatenate/stack up the output of transform models for an input image and use it as a representation of the image and feed into a binary classifier. This might could have a higher detection accuracy and generalize better across different type of attacks.
Investigation:
Identify patterns of the prediction output of BS and AE for BS and each type of AE, plot the boxplot for the average, min and max accuracy of all transform models
Detection approach 1: majority voting the prediction output of AEs is much more diverse than that of BS. That is, the number of transform models agrees with each other on AE will be much smaller than that on BS. If this number is below some threshold, say 75% X total_number_of_models, the input image will be marked as an AE. Otherwise, it is considered as a benign sample.
Detection approach 2: distance matrix 【empirical evidence to collection】: distance of prediction outputs of a benign sample from two distinct transform models is close to 0, while the distance of prediction outputs of an AE from two distinct transform models should be much larger than 0. Try with different distance metrics: L2, entropy, KL divergence, cosine, correlation
【distance matrix】: for an image, create a distance matrix by computing the distances of its prediction outputs between each pair of transform models. Investigate any possible property or difference between AE and BS