rfeinman / detecting-adversarial-samples

Code for "Detecting Adversarial Samples from Artifacts" (Feinman et al., 2017)
108 stars 14 forks source link

Method is not oblivious to attack algorithms? #6

Open davidglavas opened 5 years ago

davidglavas commented 5 years ago

In the abstract of the paper it says "The result is a method for implicit adversarial detection that is oblivious to the attack algorithm. "

But the final detector is trained on adversarial examples generated by specific attack algorithms (see lines 151 and 153):

https://github.com/rfeinman/detecting-adversarial-samples/blob/2c26b603bfadc25521c2bd4c8cc838ac4a484319/scripts/detect_adv_samples.py#L149-L155

rfeinman commented 5 years ago

@davidglavas Our detector is composed of 2 metrics: bayesian uncertainty and kernel density. These metrics each work considerably well as standalone detectors, and they are oblivious to the attack algorithm. The only "training" we do is to learn how much to weight uncertainty vs. density (we learn a simple 2D weight vector [w1, w2]). You can specify this vector by hand, but for convenience we learned the weights by looking at a few common attack algorithms.