Review Report 2 - Anonymous Reviewer B

The following peer review was solicited as part of the Distill review process.

The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer for taking the time to give such a thorough review of this article. Thoughtful and invested reviewers are essential to the success of the Distill project.

Conflicts of Interest: Reviewer disclosed no conflicts of interest.

The present submission proposes an explanation for adversarial examples based on insufficient regularization of the model. It is argued that a lack of regularization leads to a decision boundary which is skewed and hence vulnerable to adversarial examples.

In my opinion, this article suffers from a few issues. The three main issues are:

It makes claims (at least suggestively) about adversarial examples that I think are wrong.
Its coverage of the related work is not very good.
It uses language (such as "A new angle" and "We challenge this intuition") that seems to claim substantially more novelty than there actually is. I don't mind an expository paper that is not novel, but I do mind over-claiming.

In addition, I felt the writing could be improved, but I assume that other reviewers can comment on that in more depth.

Perhaps the most serious issue is issue 1. At a high level, I am not convinced that L2 regularization / a tilted decision boundary are the primary reason for the existence of adversarial examples. While this is a natural explanation to consider, it does not match my own empirical experience; furthermore, my sense is that this explanation has occurred to others in the field as well but has been de-emphasized due to not accounting for all the facts on the ground. (I wish that I had good citations covering this, but I do not know if/where this has been discussed in detail---however, there are various empirical papers showing that L2 regularization/weight decay does not work very well in comparison to adversarial training and other techniques).

More concretely, three claims that I think are wrong are the following (a brief explanation of why I think it is wrong is given after each point):

that adversarial examples are primarily due to tilting of the decision boundary --- in high dimensions, every decision boundary (tilted or not) might have trouble avoiding adversarial examples
that weight decay for deep networks confers robustness to adversarial examples --- weight decay seems to be too crude an instrument, and conveys only limited robustness
that "the fact that [neural networks] are often vulnerable to linear attacks of small magnitude suggests that they are strongly under-regularized" --- many authors have found that neural networks actually underfit the data during adversarial training and need additional capacity to successfully fit adversarial perturbations

thomas-tanay / post--L2-regularization

Review Report 2 - Anonymous Reviewer B #12