question about the loss function in object detection lecture

vxgu86 commented 2 years ago

for the largest object detection, why are you using the b_loss ,c_loss to do backward and left off the regression_loss ? and you refer the regression_loss to the total loss and return it, why??

    # On the first steps, b_loss around 0.1, c_loss around 20.0
    b_loss = alpha_bbox * bbox_loss(outputs[0], bboxes)
    c_loss = class_loss(outputs[1], labels)

    # For the total loss
    regression_loss += bbox_reg_loss(outputs[0], bboxes).item()/4.0

    # For the total accuracy
    predicted_targets = outputs[1].argmax(dim=1)

    # Backward and optimize
    optimizer.zero_grad()
    b_loss.backward()
    c_loss.backward()

return regression_loss/N, correct/N

same question remains in the multi object detection, you calculate the b_loss ,c_loss,obj_loss and use the three loss to do backward, but you returned regression_loss,objectness_loss, this is not like you mentioned the loss function in the lecture as: J=∑i∈cells1obji(Lbox(i)+Lclass(i))+Lobj(i)

    b_loss = alpha_bbox * torch.sum(grid_cells_regression_loss[has_obj_idx]) / num_gt_objects
    # Classification loss
    c_loss = torch.sum(grid_cells_classification_loss[has_obj_idx.view(-1)]) / num_gt_objects
    # Objectness loss
    obj_loss = F.binary_cross_entropy(torch.sigmoid(predicted_hasobj), has_obj.float())

    # For the total loss
    regression_loss += num_gt_objects * b_loss.item()/alpha_bbox
    objectness_loss += inputs.shape[0] * obj_loss.item()

jeremyfix commented 2 years ago

The regression loss is not really dropped out; I distinguish the loss used for updating the model's parameters and the loss used for reporting to the user. For the updating the model, the regression loss is given by b_loss and the classification loss by c_loss. For reporting to the user, I use the L1 loss (not smoothed) and the accuracy.

The SmoothL1Loss does contain a quadratic part and, I believe, is not necessarily easy to reason about. I have the impression that the L1 loss (not smoothed) is easier to interpret for us.

The same applies for the multiple object. The "J" loss is the one on which the gradient is computed and is actually involving b_loss, c_loss and obj_loss.

vxgu86 commented 2 years ago

got it.thanks. I think the largest object detection is actually not meaning "largest", it's almost draw bbox in the middle of the pic if there are more than one objects, we should refer it to the "problem of object detection"

jeremyfix commented 2 years ago

It is the "largest object detection" since the objects are filtered by size and the network is only requested to respond for that object. That it draws a bbox in the middle of the picture might be due to a bias in the way people do take pictures and align their camera.

jeremyfix / deeplearning-lectures

question about the loss function in object detection lecture #14