In the figure 9 of your paper, I noticed that by using L2 norm, the landscape becomes more narrow around the minimal point. Which is different from previous figures.
I do know that you are using a different way of choosing vectors by PCA. And it can be understood by a way from-result-to-cause -- that is, L2 norm makes it harder to train, so the convex part is smaller. However, I curious if you have any deeper insight of this pattern? Thanks!
In the figure 9 of your paper, I noticed that by using L2 norm, the landscape becomes more narrow around the minimal point. Which is different from previous figures.
I do know that you are using a different way of choosing vectors by PCA. And it can be understood by a way from-result-to-cause -- that is, L2 norm makes it harder to train, so the convex part is smaller. However, I curious if you have any deeper insight of this pattern? Thanks!