Why this particular construction?

bkj commented 7 years ago

Hi --

I was wondering where you got the idea for the specific construction of the L-softmax. It seems like maybe you could achieve a similar goal by enforcing a margin like

norm(W) * norm(x) * (m * cos(theta) - m + 1)

instead of

norm(W) * norm(x) * cos(m * theta)

as you do in the paper.

The former seems simpler because you don't have to worry about constructing a psi function that behaves well for all values of theta, m doesn't have to be integer valued, etc. Also, in the paper, the gradient of psi is 0 at pi/2, which AFAICT is an undesirable side effect of the choice of psi. Is that right, or is there some reason that grad psi(pi/2) should be 0?

The proposed alternative above would have the same shape as cos in [0, pi] but with a range of [-m, 1], which seems maybe more natural.

Thoughts? Am I missing something? Did you try this and it stunk in practice?

Thanks

luoyetx commented 7 years ago

@bkj Hi, I'm not the author of the paper. The official implement released by author is wy1iu/LargeMargin_Softmax_Loss. I personally tried norm(w) * norm(x) * (cos(theta) / m) and it plays bad. Then I realize that at every angle theta, we need to shrink the angle with same scale.

bkj commented 7 years ago

Interesting. Is there a dataset you use to test your implementations of these things? Something synthetic or MNIST or something?

On Mon, May 8, 2017 at 9:45 PM Jie Zhang notifications@github.com wrote:

@bkj https://github.com/bkj Hi, I'm not the author of the paper. The official implement released by author is wy1iu/LargeMargin_Softmax_Loss https://github.com/wy1iu/LargeMargin_Softmax_Loss. I personally tried norm(w)

norm(x) * (cos(theta) / m) and it plays bad. Then I realize that at every angle theta, we need to shrink the angle with same scale.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/luoyetx/mx-lsoftmax/issues/4#issuecomment-300038070, or mute the thread https://github.com/notifications/unsubscribe-auth/AFzgfd237sBEOaw5SGM1EzZycl6N7Hsoks5r38UugaJpZM4NUmKF .

luoyetx / mx-lsoftmax

Why this particular construction? #4