max-andr / square-attack

Square Attack: a query-efficient black-box adversarial attack via random search [ECCV 2020]
https://arxiv.org/abs/1912.00049
BSD 3-Clause "New" or "Revised" License
151 stars 28 forks source link

1.Hi, I would like to know what do you refer to as Rademacher distribution in A.4, and why #12

Closed Kitzzaaa closed 2 years ago

Kitzzaaa commented 2 years ago

屏幕截图 2022-03-13 203726 2.In page 21 of the paper, i.e. my diagram below, how do I get the last step from (iii) 屏幕截图 2022-03-13 204059 3.In Sec4.1, you only proved that g is an L-smooth objective function, but the loss function (1) in the paper is obviously not an L-smooth function, so how do you explain the convergence of the algorithm you proposed in this paper?

Kitzzaaa commented 2 years ago

@max-andr

max-andr commented 2 years ago

Hi,

  1. Rademacher distribution is the uniform distribution over {-1, 1}, i.e. we take -1 or 1 with 50% probability. I think it's pretty common to call this distribution "Rademacher", e.g. see the Wikipedia page: https://en.wikipedia.org/wiki/Rademacher_distribution.

  2. Hmm, the Jensen's inequality in step (iii) actually should not be there. So please just ignore that step. The final lower bound is correct, though. The trick is to explicitly write out E_{(r,s)} ||V_{(r,s)}||_2 as the sum of L2 norms of patches V_{(r,s)} normalized by w^2 (since we have w^2 possible positions of the coordinates r and s). We can then reduce this sum to h^2 summations of L2 norms over non-overlapping patches, each of the summation is lower bounded by ||v||_2 (this lower bound is essentially similar to the fact that L1-norm is always greater or equal to the L2-norm). Thus, we get that E_{(r,s)} ||V_{(r,s)}||_2 \geq h^2 / w^2 ||v||_2

  3. Indeed, for untargeted attacks we use the margin loss which is not smooth. However, for targeted attacks we do use a smooth loss (cross-entropy). Actually, we could have used the cross-entropy loss also for untargeted attacks, I don't think it would change the results much. But I agree that we should've commented on this discrepancy between the usage of the margin loss and the L-smoothness assumption.

These were all good points, thank you for the careful reading of our paper! Let me know if you find something else.

Best, Maksym