Closed Kitzzaaa closed 2 years ago
@max-andr
Hi,
Rademacher distribution is the uniform distribution over {-1, 1}, i.e. we take -1 or 1 with 50% probability. I think it's pretty common to call this distribution "Rademacher", e.g. see the Wikipedia page: https://en.wikipedia.org/wiki/Rademacher_distribution.
Hmm, the Jensen's inequality in step (iii) actually should not be there. So please just ignore that step. The final lower bound is correct, though. The trick is to explicitly write out E_{(r,s)} ||V_{(r,s)}||_2
as the sum of L2 norms of patches V_{(r,s)}
normalized by w^2
(since we have w^2
possible positions of the coordinates r
and s
). We can then reduce this sum to h^2
summations of L2 norms over non-overlapping patches, each of the summation is lower bounded by ||v||_2
(this lower bound is essentially similar to the fact that L1-norm is always greater or equal to the L2-norm). Thus, we get that E_{(r,s)} ||V_{(r,s)}||_2 \geq h^2 / w^2 ||v||_2
Indeed, for untargeted attacks we use the margin loss which is not smooth. However, for targeted attacks we do use a smooth loss (cross-entropy). Actually, we could have used the cross-entropy loss also for untargeted attacks, I don't think it would change the results much. But I agree that we should've commented on this discrepancy between the usage of the margin loss and the L-smoothness assumption.
These were all good points, thank you for the careful reading of our paper! Let me know if you find something else.
Best, Maksym
2.In page 21 of the paper, i.e. my diagram below, how do I get the last step from (iii) 3.In Sec4.1, you only proved that g is an L-smooth objective function, but the loss function (1) in the paper is obviously not an L-smooth function, so how do you explain the convergence of the algorithm you proposed in this paper?