Early-stop Decision Attack

openphilanthropy / unrestricted-adversarial-examples

Contest Proposal and infrastructure for the Unrestricted Adversarial Examples Challenge

Apache License 2.0

329 stars 55 forks source link

Early-stop Decision Attack #22

Closed carlini closed 6 years ago

carlini commented 6 years ago

With a L2 threshold of 4 (or 10), once we succeed, we can early-stop the decision-only attack. We don't care exactly about finding the best attack -- just if we can find one within the threshold specified, and a lot of work is spent optimizing the attack at the end.

wielandbrendel commented 6 years ago

Here is the author of the Boundary Attack. I second @carlini : if you are fine with an adversarial smaller than the threshold one should definitely use early stopping as most energy is spent on the last 10% rather than the first 90%. This is not yet implemented in Foolbox but it is something we can add if you are interested.

carlini commented 6 years ago

That would be great if you wanted to do this. On defended models, I don't know if I expect this to help that much, because the attack might fail often, but for un-defended models this should speed up evaluation significantly.

jonasrauber commented 6 years ago

early stopping is now supported in all Foolbox attacks including the Boundary Attack: https://github.com/bethgelab/foolbox/pull/213

jonasrauber commented 6 years ago

@carlini I just implemented this and released it as part of Foolbox 1.5.0.

You can use it by changing

self.attack = FoolboxBoundaryAttack(model=Model())

mse_threshold = l2_threshold**2 / number_of_pixels
self.attack = FoolboxBoundaryAttack(model=Model(), threshold=mse_threshold)

(for now, the BoundaryAttack threshold for early stopping has to be specified in mean squared error units, assuming images in [0, 1])

carlini commented 6 years ago

Perfect! I'll take a look at this tomorrow. In order to ensure backwards compatibility, I'll probably warn and fall-back to the prior method if the code version is <1.5.

carlini commented 6 years ago

Done in 46347103.