max-andr / square-attack

Square Attack: a query-efficient black-box adversarial attack via random search [ECCV 2020]
https://arxiv.org/abs/1912.00049
BSD 3-Clause "New" or "Revised" License
150 stars 28 forks source link

Why do you use two windows in L2 norm attack? I don't understand the mass moving (Fig.2) #3

Closed machanic closed 4 years ago

machanic commented 4 years ago

Hi, When I read your paper, Why do you use two windows in L2 norm attack? I don't understand your L2 norm attack and the mass moving (Fig. 2)? Can you explain that for easy to understand? Thank you!

max-andr commented 4 years ago

Thanks for the interest in our paper!

The Linf attack is simple since for Linf-perturbations each coordinate can be changed up to eps independently of other coordinates. This independence just follows from the fact that the Linf-ball is separable over dimensions.

However, it's not the same for L2-perturbations. There we have some total L2-budget calculated over all coordinates of delta, and not over each coordinate separately. Thus, we have to make sure that when we modify one group of pixels, we do not exceed the total budget. Since on each iteration we are approximately at the boundary of the L2-ball, in order to make a new modification we first have to take some mass from one subset of entries of delta (i.e. to reduce the L2-norm of delta) in order to be able to change some other entries of delta. That's the main reason why we use two windows in the L2 attack.

Here is a toy example to illustrate what we do. Say, the L2-norm of perturbations is 2. Here is the vector we have which has exactly L2-norm of 2: delta_before = [1, 1, 1, 1, 0, 0, 0] But we want to sample a new vector that would have some entries different from delta_before although still being close to it. To achieve this, we can remove some "mass" from the first 2 coordinates of delta_before -> this will lead to a vector [0, 0, 1, 1, 0, 0, 0]. Wonderful, now its L2-norm is equal to sqrt(2), thus we have some "free mass". We can use it to change, say, the last coordinate, i.e. we will get the following vector: delta_after = [0, 0, 1, 1, 0, 0, sqrt(2)] which now has L2-norm of 2 again. So delta_after will be the perturbation that we will use to query the black-box model on to check whether our modification of the perturbation helps to reduce the loss or not.

That's really the core idea, so it's really simple. It just gets a bit more involved when one has to write it down formally (see Algorithm 3), but the actual idea is very intuitive.

max-andr commented 4 years ago

Closing it for now, feel free to reopen if you have additional questions!

machanic commented 4 years ago

@max-andr However, the attack that uses the similar strategy with yours is the SignHunter attack (the paper of ''sign bits are all your need''), its L2 norm attack is not that complicated(two windows), it only modifies each pixel to the largest L2 norm distance.

https://github.com/ash-aldujaili/blackbox-adv-examples-signhunter/blob/master/src/attacks/blackbox/sign_attack.py#L62 The above code calls x + lr * g / norm(g) https://github.com/ash-aldujaili/blackbox-adv-examples-signhunter/blob/master/src/utils/compute_fcts.py#L79

max-andr commented 4 years ago

This approach is essentially equivalent to searching for adversarial examples in the Linf-ball embedded in the L2-ball. It is easy to implement, but it doesn't perform very well since it performs a search in a much smaller set (particularly when the input dimension is large). In particular on ImageNet, it performs worse than Bandits: image

They also discuss this issue in more detail in APPENDIX I. ON THE L2-L∞ PERFORMANCE GAP in their paper. Based on my experience, I would say is that it is okay to reduce the search space only to the extreme points of the perturbation set (that's what we do in the Square Attack), but it is not okay to exclude a large portion of them (e.g. by restricting delta_i to be in {-1, 1} or in {-1, 0, 1}) which leads to suboptimal results.

machanic commented 4 years ago

I am currently working on the research to improve your Square Attack, but many attempts have been failed, your approach seems that it has reached the performance limitation of the query-based black-box attack. Do you have any suggestion?

max-andr commented 4 years ago

Do you have any suggestion?

I think one can still improve the attack in several ways boosting the results further.

For example, one idea would be that instead of changing pixels in each square to the same value, one could change it to horizontal or vertical stripes. Currently, we use stripes for initialization of the perturbation, but they also can be used in the sampling distribution to generate a new update. We did some preliminary tests, and that worked also quite well. For the best performance, one can try to adaptively decide on the type of the update (constant, horizontal stripes, vertical stripes) based on how well it works for each image separately. Moreover, one can also try to have stripes of different width (not just of width 1 as we used in the initialization).

Another aspect that probably can be improved is the schedule for the size of the square. Currently, we have a piecewise constant schedule for it, but maybe one can rather detect when the loss plateaus, and reduce the size of the square then. The advantage would be that this scheme can lead to a different adaptive schedule for every image which can be better than a single schedule for all images.

machanic commented 4 years ago

@max-andr Thank you very much! Your study is excellent, the experiment result is unbelievable! Currently I am searching for the approach to improve random search, that is the core of your algorithm. I search a lot of literatures, but the random search seems quite remote, I cannot find the literature of the improvement of random search. Do you have any idea that how to improve the core of your algorithm, random search.

Also I consider your suggestions, and I will cite your paper and make it as my baseline.

max-andr commented 4 years ago

Do you have any idea that how to improve the core of your algorithm, random search.

I do not see how the core idea (randomized search over the extreme points of the constraint set) can be improved substantially. But what one can certainly try to improve is the sampling distribution. That's exactly where one can easily incorporate some domain-specific prior knowledge into the problem. One example that we did was to use square-shaped updates in the sampling distribution where we change all the values in a square to the same value (at least, for Linf-perturbations). One can think about other, more advanced schemes on how to improve the sampling distribution (both for L2 and Linf).

machanic commented 4 years ago

@max-andr I conducted a experiment of L_inf attack experiment to attack 1000 images in ImageNet, the target model is Inception_v3. However, when I use the original square attack, the performance is best, the Avg. query per successful image = 254. When I change the update window to the vertical strip, the Avg. query = 416, and horizontal strip of update window also leads to Avg. Query =424. Maybe the value inside updated window should not be vertical strip ? Because it perform worse than all value constantly equal epsilon, which is original Square Attack performs best?

max-andr commented 4 years ago

Well, that's why we didn't use the stripes in the update ;)

But our hypothesis was that for some images updating stripes may be better. So you can come up with some input-dependent scheme that would automatically decide which perturbation time is better for the given image. If this scheme is reasonable, then in the worst case it should be as good as the vanilla square attack, but maybe there will be some improvement (maybe a small one, one would need to benchmark it).