Open AlexeyAB opened 6 years ago
Or is it correct as written here? https://blog.csdn.net/linmingan/article/details/77885832
For Focal Loss (when gamm=2), delta is:
if (i == j)
then delta = (1-p)*
-1 *alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)
if (i != j)
then delta = (-p)*
-1 *alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)
Because:
float grad =-2*(1-output[ti])*logf(fmaxf(output[ti],0.0000001))*output[ti]+(1-output[ti])*(1-output[ti]);
Or the same:
float grad
=
-2*(1-pt)*log(pt)*pt + (1-pt)*(1-pt)
=
(1-pt)*-2*pt*log(pt) + 1-pt)*(1-pt)
=
(1-pt)*(-2*pt*log(pt) + (1-pt))
=
-1 * (1-pt)*(2*pt*log(pt) + pt - 1)
Hi @unsky Please, can you check, did I read your Focal Loss formulas correctly?
For CE, delta is:
if (i == j)
thendelta = 1-p
if (i != j)
thendelta = -p
For Focal Loss (when gamm=2), delta is:
if (i == j)
thendelta = (1-p)*
alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)
if (i != j)
thendelta = (-p)*
alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)
Where are:
pt = softmax(i)
- is a probability of the correct class id.p = softmax(j)
where isi = label
truth class id.