longcw / yolo2-pytorch

YOLOv2 in PyTorch
1.54k stars 421 forks source link

CrossEntropy Loss or MSELoss in cls_closs? #31

Open xmfbit opened 6 years ago

xmfbit commented 6 years ago

Good work. But I am confused about how to calculate cls loss. It seems that you used MSELoss in your code. However, I find that in darknet, when computing gradient, the formula seems like cross-entropy: see https://github.com/pjreddie/darknet/blob/master/src/region_layer.c#L130. Besides, in the paper YOLO9000, the author seemed to use MSELoss just like what he did in YOLOV1.

So could you check this? Thank you.

xmfbit commented 6 years ago

OK...I see. You used one-hot vector as gt_classes. But a new question is that: the gradient (gt_class - prob) should be passed directly to the output of the final conv-layer (let's call it x), while you used softmax(x) in the code (prob_pred = F.softmax(score_pred.view(-1, score_pred.size()[-1])).view_as(score_pred)), then the autograd mechanism will bp through softmax operation. Is it right?

AndresPMD commented 6 years ago

Hello xmfbit,

Did you manage to understand the loss function? I am struggling with that as well.