thomas-tanay / post--L2-regularization

Distill submission
33 stars 1 forks source link

Toy Problem #6

Open thomas-tanay opened 7 years ago

thomas-tanay commented 7 years ago

Thanks for the quick and already detailed comments, this is great!

I open here a new issue commenting issues #2 and #5. If I'm correct, both issues raise similar points: the introductory toy problem is useful but sometimes hard to follow. This might discourage some readers (especially being the first section of the article) and should be simplified and/or made more intuitive. I agree with these concerns. And since the classes of images considered are arbitrary, lots of alternatives are possible.

Before I discuss this further, I should clarify the different constraints that led me to this particular choice of classes: 1) There should be a clear feature (linearly separable) to distinguish between the two classes. → white/black images. 2) There should be some intra-class variability (to make the problem more realistic). → random values in the intervals [-1, -0.1] and [0.1, 1] (I didn't want the two classes to intersect in 0). 3) There should be at least one (but natural data tends to have more than one) flat direction of variation along which the classification boundary can tilt. → null half image. 4) The class definitions should be valid whether the dimensionality of the problem is 2 or 200 (to show that dimensionality does not influence the phenomenon). → when the number of dimensions is 2, there is one random pixel and one null pixel. More dimensions lead to a random half image and a null half image.

Among the modifications suggested in issues #2 and #5, some are at least partially in conflict with the constraints just described:

Concerning the image representation used, I think it is important to show the actual images (so that readers can see the adversarial examples) and their projections in the deviation plane (so that readers can understand why the phenomenon occurs). I like the idea of using arrows to visualize the vectors (suggested in issue #5), but my concern is that introducing a third representation for the images might be slightly redundant, and confusing to some readers.

On the other hand, a number of the modifications suggested in issues #2, #3 and #5 are compatible with the current class definitions and can help make the toy problem more accessible:

I'll keep this issue in mind, and try to think of other ways to simplify the setup of the toy problem.

colah commented 7 years ago

Glad the feedback was helpful! :)

I'd push back very slightly on desiderata 4 (the 2 or 200 dimensions one). One disadvantage to this is that it's a little weird to think about the same problem in a varying number of dimensions at one time. Your diagram is also simpler if the axes are dimensions instead of linear subspaces -- it's a little tricky to reason about what the axes are when its 200 dimensions.

a lot of people might find it difficult to regard 2-dimensional vectors as images (and this might make the toy problem even less convincing for them)

I think you may be raising two concerns here:

  1. We'd like to present adversarial counter-examples from the toy problem in a visually compelling way. #5 addresses some ways of doing this for 2-dimensional problems.

  2. Two dimensional examples may feel less compelling to the reader, because they're not like images. I think this may actually cut the other way, however. If you use an example that tries to be like an image but clearly isn't realistic, you may make the reader more skeptical than if you own the lack of realism with the simplest most minimal example.

That said, I think you can do an excellent job with your present problem if you want to stick with that, so please feel free to ignore me. :)

thomas-tanay commented 7 years ago

After reflection, I agree that making the toy problem independent of the dimension might be confusing to the reader. It might be clearer to focus on the 2-dimensional case first, and then later on introduce the idea that higher dimensional problems can also be visualized into the plane.

In that case, using an inner square/outer background type of representation would indeed be more compelling (although it departs from the notion that dimensions are pixels in images).

If we decided to expand the toy problem section, it could also be the opportunity to give a hint to the reader as to why the angle of the classification boundary can sometimes be so unnatural. This is something we did in our arxiv paper “a boundary tilting perspective on the phenomenon of adversarial examples”. The toy problem we considered there was slightly different, but when we trained a SVM on it, the resulting weight vector also did not suffer from adversarial examples:

toyexample1

If we then added corrupted images to our training data however (100% random images), the weight vector we obtained then suffered from strong adversarial examples:

withoutregularisation

This happens because SVM (without regularization) overfits the corrupted images in the training data.

We weren't sure this added complexity was helping here and decided to ignore it completely (instead keeping the justification as to why the boundary tilts to the next section on MNIST). In any case, I agree that the toy problem section will greatly benefit from a small introductory warning acknowledging concerns (“this toy problem will feel unnatural at first, but bear with us, it introduces a perspective on the phenomenon that will become more compelling in the following section”)

colah commented 7 years ago

Sorry about the dropping the ball on this!! It completely fell through the cracks.

After reflection, I agree that making the toy problem independent of the dimension might be confusing to the reader.

(Please don't feel pressure here. My sense is that this could help, but I also think it could be great as is. I have a bit of a penchant for pushing for 2D examples when possible, and it isn't always right!)

If we then added corrupted images to our training data however...

I like this example! If you don't feel like it's something you want to explore in the introductory example, your small acknowledgement could even briefly allude to the fact that corrupted / noisy datapoints can cause this.

(If you did want to allude to this a bit more explicitly, it occurs to me that you could take the two dimensional case and add a single corrupted data point to make the maximum accuracy model have an arbitrarily bad tilting angle...)

colah commented 7 years ago

eg.

image