junjie18 / CMT

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
Other
308 stars 34 forks source link

loss compute #71

Closed vehxianfish closed 3 months ago

vehxianfish commented 10 months ago

Hi,@junjie18, thanks for your wonderful work and for opening the code.
I have a question about the formation of the format of prediction bounding boxes. For prediction boxes, the code is

                pred_bbox = torch.cat(
                    (preds_dict[0]['center'][dec_id], preds_dict[0]['height'][dec_id],
                    preds_dict[0]['dim'][dec_id], preds_dict[0]['rot'][dec_id],
                    preds_dict[0]['vel'][dec_id]

which means pred_bbox is cx, cy, cz, w, l,h, rot.sin(), rot.cos(), vx, vy for gt_boxes in compute losses

But in the denormalize box for the test:

def denormalize_bbox(normalized_bboxes, pc_range=None):
    # rotation 
    rot_sine = normalized_bboxes[..., 6:7]

    rot_cosine = normalized_bboxes[..., 7:8]
    rot = torch.atan2(rot_sine, rot_cosine)

    # center in the bev
    cx = normalized_bboxes[..., 0:1]
    cy = normalized_bboxes[..., 1:2]
    cz = normalized_bboxes[..., 4:5]

    # size
    w = normalized_bboxes[..., 2:3]
    l = normalized_bboxes[..., 3:4]
    h = normalized_bboxes[..., 5:6]

    w = w.exp() 
    l = l.exp() 
    h = h.exp() 

    if normalized_bboxes.size(-1) > 8:
         # velocity 
        vx = normalized_bboxes[..., 8:9]
        vy = normalized_bboxes[..., 9:10]
        denormalized_bboxes = torch.cat([cx, cy, cz, w, l, h, rot, vx, vy], dim=-1)
    else:
        denormalized_bboxes = torch.cat([cx, cy, cz, w, l, h, rot], dim=-1)
    return denormalized_bboxes

, the format is cx, cy, w, l, cz, h, rot.sin(), rot.cos(), vx, vy. Can you give some explanations? thanks a lot

vehxianfish commented 9 months ago

Hi, @junjie18 . This question baffles me till now. Could you explain? Thank you very much.

junjie18 commented 9 months ago

@vehxianfish

Hi, thanks for your detailed comment. I think it is a history mistake when I try to seperate task and head.

In the final version of CMT, only one task head is provided, center, height, dim, rot and vel can be replaced with one head(10 channels) here for acceleration. However, the calculation is not totally equal due the non-linearity between the two layers.

It maybe a small bug that have small influence on the final result. Welcome to share your new result.

https://github.com/junjie18/CMT/blob/47b6147c5981422ae5a732641ba76026ac958e21/projects/configs/fusion/cmt_voxel0075_vov_1600x640_cbgs.py#L242-L250

vehxianfish commented 9 months ago

Thanks for your response. I will try