Closed kdg1993 closed 1 year ago
The original formula of ASL by the paper (https://arxiv.org/pdf/2009.14119.pdf) is
$$ ASL(y, p) = -\Big( y\ (1-p)^{\gamma+}\ log(p) + (1-y)\ p{m}^{\gamma{-}}\ log(1-p{m}) \Big),\ where\ p_{m}=max(p-m, 0)$$
However, the implemented version of ASL is
$$-\Big( \big( 1-y\ p-(1-y)(1-p{m}) \big)^{\big(y\ \gamma{+}\ +\ (1-y)\ \gamma{-}\big)}\big( y\ log(p) + (1-y)\ log(1-p{m}) \big) \Big),\ where\ p_{m}=max(p-m, 0)$$
If y is neither 0 nor 1, the gamma terms affect both logarithmic terms, which is different from the original paper.
Since the ASL implementation assumes y (ground truth) is either 0 or 1, it conflicts with the label smoothing.
What
The original formula of ASL by the paper (https://arxiv.org/pdf/2009.14119.pdf) is
$$ ASL(y, p) = -\Big( y\ (1-p)^{\gamma+}\ log(p) + (1-y)\ p{m}^{\gamma{-}}\ log(1-p{m}) \Big),\ where\ p_{m}=max(p-m, 0)$$
However, the implemented version of ASL is
$$-\Big( \big( 1-y\ p-(1-y)(1-p{m}) \big)^{\big(y\ \gamma{+}\ +\ (1-y)\ \gamma{-}\big)}\big( y\ log(p) + (1-y)\ log(1-p{m}) \big) \Big),\ where\ p_{m}=max(p-m, 0)$$
If y is neither 0 nor 1, the gamma terms affect both logarithmic terms, which is different from the original paper.
Why
Since the ASL implementation assumes y (ground truth) is either 0 or 1, it conflicts with the label smoothing.
How