Review Report 1 -- Reviewer A

The following peer review was solicited as part of the Distill review process.

The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer for taking the time to give such a thorough review of this article. Thoughtful and invested reviewers are essential to the success of the Distill project.

Conflicts of Interest: Reviewer disclosed no conflicts of interest.

Overall Review: I thought the plots and intuition given were nice, below I have some comments and clarifying questions. I’d be surprised if this is the first work to give a relationship between weight decay and distance from the data to the margin, but I’m not personally aware of prior work that does this. Have the authors done a literature review to see if this relationship exists in prior work? I believe this is worth publishing on distill but am not confident in this assessment.

Detailed Comments:

In the first plot (and rest of paper) is adversarial distance d_adv the average distance between the training data and the boundary? Or just a single data point?

“First, it challenges conventional wisdom on generalization in machine learning.”

Why do adversarial examples challenge the fundamentals of generalization? Generalization deals with probability or the average behavior of a model under the data distribution. Adversarial examples deal with the worst case behavior of a model.
I would cite the recent Madry et. al. paper on adversarial training https://arxiv.org/abs/1706.06083

“Here, we challenge this intuition and argue instead that adversarial examples exist when the classification boundary lies close the data manifold—independently of the image space dimension.”

I don’t see why this challenges the earlier intuition, the two perspectives are complementary.

“According to the new perspective, adversarial examples exist when the classification boundary approaches the data manifold in image space.”

One can make the same statement in the higher dimensional data space. (I like the idea of a toy image space, it just seems to me like the classification approaches the data manifold in data space if and only if it approaches in image space.)
How are the adversarial examples generated for the LeNet trained with weight decay (Projected gradient descent? Single step fast gradient sign method?)
I believe similar results for MNIST have been demonstrated with models trained with an RBF kernel (I think Ian Goodfellow has done work on this).

Thank you for all the comments!

“I’d be surprised if this is the first work to give a relationship between weight decay and distance from the data to the margin, but I’m not personally aware of prior work that does this. Have the authors done a literature review to see if this relationship exists in prior work?”

It is true that our idea of regularization acting on the scaling of the loss function and resulting in a form of adversarial training is fairly simple and it would be surprising if it hadn't been discussed before. However, we are not aware of any prior work doing this in details either.

We do refer to Goodfellow's observation that adversarial training is similar to L1 regularization towards the end of the article [2]. There are also some results similar to ours in the linear case and I will add references to them. First, there is the work of Xu et al. (2008) on the link between robustness and regularization of SVMs. Second, there is the work of Marron et al. (2007) on the data piling phenomenon with SVM in high-dimension low sample size data.

In the first plot (and rest of paper) is adversarial distance d_adv the average distance between the training data and the boundary? Or just a single data point?

We define the adversarial distance d_adv in the section “scaling the loss function” as:

d_adv = Sum_T y d(x)

It is the average distance between the training data and the boundary, with a negative contribution for the misclassified data.

Then in the following section (“Adversarial Distance and Tilting Angle”) we show that:

d_adv = ½ || i – j || cos(theta)

Where i and j are the centroids of the two classes in the training set and theta is the tilting angle. Hence in the first plot, d_adv is the average distance between the training data and the boundary, which also happens to be the average distance between the boundary and each centroid.

I am thinking to add a couple of figures in the section “Example: SVM on MNIST” to explain this result further.

Why do adversarial examples challenge the fundamentals of generalization? Generalization deals with probability or the average behavior of a model under the data distribution. Adversarial examples deal with the worst case behavior of a model.

This is a good point. I will clarify my statement and explain that if the existence of adversarial examples is not incompatible with good generalization, it tends to be in contradiction with people's intuition of what it means to generalize well.

I would cite the recent Madry et. al. paper on adversarial training

Thank you for the reference. The field is very active at the moment and there has been a number of important publications since I started writing. There is also Carlini et al. on establishing ground truth adversarial examples or Koh and Liang discussing adversarial training examples. Also interesting is Rozsa et al. arguing that the existence of adversarial examples is related to overfitting within each mini-batch. I will add references to these different results.

I don’t see why this challenges the earlier intuition, the two perspectives are complementary.

There are several aspects to the “linear explanation of adversarial examples” from Goodfellow et al. [2] One general aspect concerns the fact that deep neural networks tend to behave linearly in the neighbourhood of their inputs, making it possible to change their predictions with small linear perturbations (the gradient sign method). This result is indeed compatible with ours.

However, a narrower aspect is the explanation of the existence of adversarial examples for linear models as a property of the dot product in high dimension (section 3. in [2]). This explanation is in contradiction with our perspective: high dimensionality is neither necessary nor sufficient for adversarial examples to exist in linear classifiers (in particular, we show that they can exist in 2 dimensions).

One can make the same statement in the higher dimensional data space.

I am not sure how to understand the distinction between “data space” and “image space” here. I think the confusion might come from the fact that in our toy problem, dimensions are not pixels (as is usually the case with images) but concentric squares. However, one can imagine the same toy problem with images formed of only two adjacent pixels (we discussed the pros and cons of the two configurations in #6, and settled for the version with concentric squares as it is more compelling visually).

Hence I think “data space”, “image space” and “pixel space” are all synonymous here, and this space is 2-dimensional. I will try to clarify this in the text.

How are the adversarial examples generated for the LeNet trained with weight decay (Projected gradient descent? Single step fast gradient sign method?)

They were generated using projected gradient descent (performed until the median confidence level of 0.95 was reached). This should indeed be specified, I will give more details in the text.

I believe similar results for MNIST have been demonstrated with models trained with an RBF kernel (I think Ian Goodfellow has done work on this).

You might be thinking about section 7 of Goodfellow et al. [2]. A couple of other works now suggest that it is possible to regularize neural networks against adversarial examples on MNIST (e.g. the Madry et. al. paper and the Rozsa et al. paper).

Note that we consider our result on MNIST as an invitation to explore further, rather than as a strong claim that weight decay is the definitive solution to the adversarial example phenomenon in neural networks. In particular, we are still trying to understand in details what influences the scaling of the weights (weight decay, momentum, batch norm, etc.) and we are still trying to replicate this result on deeper models and more complex datasets.

thomas-tanay / post--L2-regularization

Review Report 1 -- Reviewer A #11