openphilanthropy / unrestricted-adversarial-examples

Contest Proposal and infrastructure for the Unrestricted Adversarial Examples Challenge
Apache License 2.0
329 stars 55 forks source link

Hidden randomness in defenses #65

Open yi-sun opened 5 years ago

yi-sun commented 5 years ago

The contest proposal states:

The following would not be shared exact sequence of randomness during evaluation (e.g. np.seed)

We have a few clarifying questions:

  1. When submitting a defense, is the defense required to perform well for all values of np.seed, or may the defenders specify a specific value which is hidden from attackers?
  2. In the latter case, how would this be implemented in the Docker framework?
carlini commented 5 years ago

We haven't carefully considered this yet. I would be partial to saying that a defense should work with any random seed, but that it is free to choose a fresh seed every time it classifies an image.

If we instead allow the defense to only work with one seed the defender knows and the attacker doesn't, we're no longer in a fully white-box threat model: the defender now gets to hold something secret.

But I think it would be worth discussing this to make sure there aren't any unintended consequences. Can you think of a defense where it makes sense to only work for one random seed but not others?

yi-sun commented 5 years ago

We have been testing a specific defense idea leveraging private randomness which I've emailed you about privately. Please let me know if you'd prefer to keep the rules discussion on this thread, in which case I'll try to rephrase our idea in a less specific way.

carlini commented 5 years ago

Let me take a look at your email.

carlini commented 5 years ago

I've been giving this some thought. I'm inclined to say "no" that defenses must work with an arbitrary seed. If we allow defenses to have a secret seed, then what's to say that they don't use this to initialize some weights of the neural network and now we have a grey-box threat model which we want explicitly to avoid.

@catherio @nottombrown do you have any thoughts?

catherio commented 5 years ago

That's my inclination, too, but maybe you could forward the email so I can think about this specific case?

catherio commented 5 years ago

Ok, having read this, I agree with @carlini. The randomness is be viewed as coming from "the world"; the defender has to accept what it is given, and work well under all such situations.