Closed hengck23 closed 7 years ago
after some checking, the weighing terms (1-p)^gamma and p^gamma are back propagated as well. you can refer to:
https://github.com/zimenglan-sysu-512/paper-note/blob/master/focal_loss.pdf https://github.com/unsky/focal-loss
This is not an issue but a question.
I think the the term (1-p)^gamma and p^gamma in focal loss are for weighing only. They should not be back propagated during gradient descent. Am I correct?
If so, do you need to detach() your variables for computing the weight terms in your focal loss function?