I have seen your implementation code of focal loss in README. I notice that the examples with label == -1 are filtered before the loss computation. Do you perform sampling in training the classification subnet, I mean, like faster rcnn, make the positive: negative = 1 : 3, then apply the focal loss ? I'm a little confused, because the in the paper, the author said they take all the examples to compute the focal loss.
I have seen your implementation code of focal loss in README. I notice that the examples with label == -1 are filtered before the loss computation. Do you perform sampling in training the classification subnet, I mean, like faster rcnn, make the positive: negative = 1 : 3, then apply the focal loss ? I'm a little confused, because the in the paper, the author said they take all the examples to compute the focal loss.