Open bkj opened 7 years ago
@bkj Hi, I'm not the author of the paper. The official implement released by author is wy1iu/LargeMargin_Softmax_Loss. I personally tried norm(w) * norm(x) * (cos(theta) / m)
and it plays bad. Then I realize that at every angle theta
, we need to shrink the angle with same scale.
Interesting. Is there a dataset you use to test your implementations of these things? Something synthetic or MNIST or something?
On Mon, May 8, 2017 at 9:45 PM Jie Zhang notifications@github.com wrote:
@bkj https://github.com/bkj Hi, I'm not the author of the paper. The official implement released by author is wy1iu/LargeMargin_Softmax_Loss https://github.com/wy1iu/LargeMargin_Softmax_Loss. I personally tried norm(w)
- norm(x) * (cos(theta) / m) and it plays bad. Then I realize that at every angle theta, we need to shrink the angle with same scale.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/luoyetx/mx-lsoftmax/issues/4#issuecomment-300038070, or mute the thread https://github.com/notifications/unsubscribe-auth/AFzgfd237sBEOaw5SGM1EzZycl6N7Hsoks5r38UugaJpZM4NUmKF .
Hi --
I was wondering where you got the idea for the specific construction of the L-softmax. It seems like maybe you could achieve a similar goal by enforcing a margin like
norm(W) * norm(x) * (m * cos(theta) - m + 1)
instead of
norm(W) * norm(x) * cos(m * theta)
as you do in the paper.
The former seems simpler because you don't have to worry about constructing a
psi
function that behaves well for all values oftheta
,m
doesn't have to be integer valued, etc. Also, in the paper, the gradient ofpsi
is 0 atpi/2
, which AFAICT is an undesirable side effect of the choice ofpsi
. Is that right, or is there some reason thatgrad psi(pi/2)
should be 0?The proposed alternative above would have the same shape as
cos
in[0, pi]
but with a range of[-m, 1]
, which seems maybe more natural.Thoughts? Am I missing something? Did you try this and it stunk in practice?
Thanks