Defense evaluation time limitations?

openphilanthropy / unrestricted-adversarial-examples

Contest Proposal and infrastructure for the Unrestricted Adversarial Examples Challenge

Apache License 2.0

327 stars 62 forks source link

Defense evaluation time limitations? #48

Closed erko closed 5 years ago

erko commented 5 years ago

Hi, I have question on defense evaluation time limitations: In Contest Proposal said -

throughput of at least 1 image per minute per P100

However, in warm-up it is -

throughput of at least 100 images per second

Is there any mistake? Such big difference, and warm-up limitations is seems tough (restriction :)) for defenders.

carlini commented 5 years ago

That is correct. For the warm-up with fixed attacks, we need to be able to effectively make thousands of queries. At 100 images per second, this takes a day. If it was 1 image per second, it would take 3 months to evaluate, which is just far too long. On the other hand, for the complete contest, only one image needs to be evaluated incorrectly for the attackers to win, so if that takes one minute that's okay.

(Also: there is a huge difference in difficulty between the warm-up and the full challenge. So it makes sense to allow the defender to do much more work for the full challenge.)

erko commented 5 years ago

Ok, thank you. Agree with you on difficulty of full challenge.

Let's say I have some ideas, but its performance can't match with warm-up time limitations, but may fit within following full challenge's, what effects there will be in my 2nd stage? Warm-up stage is optional?

carlini commented 5 years ago

The warm-up is completely optional and unrelated to the full challenge. There are two reasons for the warm-up:

1) We want to make sure that defenders can defeat fixed attacks before we open the flood gates to arbitrary attacks. If we can't solve fixed attacks, certainly we can't solve unbounded attacks.

2) We want to check that the dataset, processes, and associated content is all correct and useful. We've found a few problematic ambiguous images, for example, and this has helped us improve our process for image collection and labeling. So before we open up the complete challenge we want to make sure everything is working as expected.

It is neither necessary nor expected that teams who compete in the final challenge compete in the warm-up (or vice versa), although it certainly won't hurt to have worked with the dataset before.

nottombrown commented 5 years ago

Added to FAQ

nottombrown commented 5 years ago

https://github.com/google/unrestricted-adversarial-examples/blob/master/warmup.md#id-like-to-compete-in-the-full-contest-is-the-warm-up-stage-is-optional