Open xmfbit opened 6 years ago
OK...I see. You used one-hot vector as gt_classes. But a new question is that: the gradient (gt_class - prob)
should be passed directly to the output of the final conv-layer (let's call it x
), while you used softmax(x)
in the code (prob_pred = F.softmax(score_pred.view(-1, score_pred.size()[-1])).view_as(score_pred)
), then the autograd mechanism will bp through softmax operation. Is it right?
Hello xmfbit,
Did you manage to understand the loss function? I am struggling with that as well.
Good work. But I am confused about how to calculate cls loss. It seems that you used MSELoss in your code. However, I find that in darknet, when computing gradient, the formula seems like cross-entropy: see https://github.com/pjreddie/darknet/blob/master/src/region_layer.c#L130. Besides, in the paper YOLO9000, the author seemed to use MSELoss just like what he did in YOLOV1.
So could you check this? Thank you.