about loss function - Githubissues

Broad-sky commented 3 years ago

"txty_loss_function = nn.BCEWithLogitsLoss(reduction='none')" doesn't match the original paper? can you explain it.

thanks!!

developer0hye commented 3 years ago

@Broad-sky

Training with MSELoss for sigmoid output can make model drowning into local minima. Many other yolo re-implementation projects use BCELoss or BCEWithLogitsLoss like this project.

yjh0410 commented 3 years ago

Yeah，you're right. I tried BCE and MSE for obj loss, but I found that MSE always brought the better result. Therefore I select MSE not BCE.

------------------ 原始邮件 ------------------ 发件人: "yjh0410/yolov2-yolov3_PyTorch" <notifications@github.com>; 发送时间: 2021年2月16日(星期二) 中午11:49 收件人: "yjh0410/yolov2-yolov3_PyTorch"<yolov2-yolov3_PyTorch@noreply.github.com>; 抄送: "Subscribed"<subscribed@noreply.github.com>; 主题: Re: [yjh0410/yolov2-yolov3_PyTorch] about loss function (#40)

MSELoss for sigmoid output can make model drowning into local minima. Many other yolo re-implementation projects use BCELoss or BCEWithLogitsLoss like this project.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

developer0hye commented 3 years ago

@yjh0410

Hi! I recently developed and distributed python package plotbbox that is the tool to plot pretty bounding box. I share this with you!

yjh0410 commented 3 years ago

Looks so pretty! Thanks a lot !

developer0hye commented 3 years ago

@yjh0410 Thanks a lot! what object detection algorithm do you study now? Do you have recommendations?

yjh0410 commented 3 years ago

Sorry, I haven't deeply studied object detection for a long time, just optimize existing projects. My tutor is considering letting me to do research on Temporal Action Detection.

Last week, I read the paper of OneNet which removes the post-process including NMS to make the object detection pipeline more concise. The purpose of OneNet is to make one prediction correspond to one goal, so it is unnecessary to deploy NMS to filter bboxes. Maybe, I think that how to make sure that one prediction corresponds to one object rather than multi predictions corresponds to one object (which means we have to use NMS.) is a good research direction.

developer0hye commented 3 years ago

@yjh0410 Thanks! Their Idea looks so simple in terms of implementation, but effect is great!

Broad-sky commented 3 years ago

@yjh0410 @developer0hye

cls_loss = torch.sum(cls_loss_function(pred_cls, gt_cls) * gt_mask) / batch_size

Why divide by batch_size？？ Instead of all samples.

Could you explain that！ Hope to reply，Thanks!

yjh0410 commented 3 years ago

@Broad-sky It is known to us that we must normalize the loss by batch size. I also try to divide it by all samples, but it doesn't work.

developer0hye commented 3 years ago

@Broad-sky

We only calculate the class loss for positive samples. If we dived its the output of the loss function by the number of all samples, the class loss will be very small value. So, Its gradient value will be close to 0 and training will fail.

yjh0410 / yolov2-yolov3_PyTorch

about loss function #40