Open davidglavas opened 5 years ago
@davidglavas Our detector is composed of 2 metrics: bayesian uncertainty and kernel density. These metrics each work considerably well as standalone detectors, and they are oblivious to the attack algorithm. The only "training" we do is to learn how much to weight uncertainty vs. density (we learn a simple 2D weight vector [w1, w2]). You can specify this vector by hand, but for convenience we learned the weights by looking at a few common attack algorithms.
In the abstract of the paper it says "The result is a method for implicit adversarial detection that is oblivious to the attack algorithm. "
But the final detector is trained on adversarial examples generated by specific attack algorithms (see lines 151 and 153):
https://github.com/rfeinman/detecting-adversarial-samples/blob/2c26b603bfadc25521c2bd4c8cc838ac4a484319/scripts/detect_adv_samples.py#L149-L155