tomgoldstein / loss-landscape

Code for visualizing the loss landscape of neural nets
MIT License
2.79k stars 396 forks source link

Trajectory plot #1

Closed ghost closed 5 years ago

ghost commented 5 years ago

To plot the figure 10 in your paper, I am assuming I should generate the PCA directions from plot_trajectory.py first then use the PCA directions to plot the loss contours of the final model.

I had to write my own code because of some technical difficulties. However, I notice that for my data, the trajectory should start from loss ~= 0.9 but the loss contour of the final model is far from 0.9 at the trajectory starting point. This makes me think that actually there is no guarantee that loss contours which the plotted trajectory comes across reflect the real loss, in other words, the loss contours of the final model do not show the "loss landscape" along the trajectory. However, when I reduced the number of models used to perform the PCA, the loss at the trajectory starting point is near 0.9 loss contour of the final model.

This is reasonable since the final model is perturbed along pc1 and pc2, while the trajectory is projected to pc1 and pc2 and a model can actually be far away from the projection, thus the loss corresponding to the trajectory can be far from the loss contours of the final model.

I understand that pc1 and pc2 can explain most of the variance among the parameters of all the models, but there is no guarantee that it can explain the most difference between any given model and the final model. That is probably why I got more "accurate" results when I use less models to estimate the principal components?

ljk628 commented 5 years ago

Hi Shuo,

You observation is correct, the loss values obtained by PCA directions on the optimization path are not their exact loss values. The reasons are:

1) The projected model is not the same as the original model 2) When we plot the loss contours of the final model, the running mean and running variance of BN layers are fixed. These statistic values could be quite different from the values at the early stage of optimization. It is hard (if not impossible) to plot the exact loss values of all checkpoints with different BN statistics in one figure. In other words, the loss surface is not a "fixed" one as we used to imagine, but varies during the optimization due to the change of these statistics.

ghost commented 5 years ago

Thank you.