Open Broad-sky opened 3 years ago
@Broad-sky
Training with MSELoss for sigmoid output can make model drowning into local minima. Many other yolo re-implementation projects use BCELoss or BCEWithLogitsLoss like this project.
Yeah,you're right. I tried BCE and MSE for obj loss, but I found that MSE always brought the better result. Therefore I select MSE not BCE.
------------------ 原始邮件 ------------------ 发件人: "yjh0410/yolov2-yolov3_PyTorch" <notifications@github.com>; 发送时间: 2021年2月16日(星期二) 中午11:49 收件人: "yjh0410/yolov2-yolov3_PyTorch"<yolov2-yolov3_PyTorch@noreply.github.com>; 抄送: "Subscribed"<subscribed@noreply.github.com>; 主题: Re: [yjh0410/yolov2-yolov3_PyTorch] about loss function (#40)
MSELoss for sigmoid output can make model drowning into local minima. Many other yolo re-implementation projects use BCELoss or BCEWithLogitsLoss like this project.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
@yjh0410
Hi! I recently developed and distributed python package plotbbox that is the tool to plot pretty bounding box. I share this with you!
Looks so pretty! Thanks a lot !
@yjh0410 Thanks a lot! what object detection algorithm do you study now? Do you have recommendations?
Sorry, I haven't deeply studied object detection for a long time, just optimize existing projects. My tutor is considering letting me to do research on Temporal Action Detection.
Last week, I read the paper of OneNet which removes the post-process including NMS to make the object detection pipeline more concise. The purpose of OneNet is to make one prediction correspond to one goal, so it is unnecessary to deploy NMS to filter bboxes. Maybe, I think that how to make sure that one prediction corresponds to one object rather than multi predictions corresponds to one object (which means we have to use NMS.) is a good research direction.
@yjh0410 Thanks! Their Idea looks so simple in terms of implementation, but effect is great!
@yjh0410 @developer0hye
cls_loss = torch.sum(cls_loss_function(pred_cls, gt_cls) * gt_mask) / batch_size
Why divide by batch_size?? Instead of all samples.
Could you explain that! Hope to reply,Thanks!
@Broad-sky It is known to us that we must normalize the loss by batch size. I also try to divide it by all samples, but it doesn't work.
@Broad-sky
We only calculate the class loss for positive samples. If we dived its the output of the loss function by the number of all samples, the class loss will be very small value. So, Its gradient value will be close to 0 and training will fail.
"txty_loss_function = nn.BCEWithLogitsLoss(reduction='none')" doesn't match the original paper? can you explain it.
thanks!!