megvii-research / MOTRv2

[CVPR2023] MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
Other
383 stars 47 forks source link

motr.py中iterative bounding box refinement是否多余? #12

Closed chzhan closed 1 year ago

chzhan commented 1 year ago

请教下,我调试代码时发现,在decoder中已经做了iterative bounding box refinement,但是motr.py中又做了一次,这个实现是否重复了? deformable_transformer_plus.py +429

# hack implementation for iterative bounding box refinement
if self.bbox_embed is not None:
    tmp = self.bbox_embed[lid](output)
    if reference_points.shape[-1] == 4:
        new_reference_points = tmp + inverse_sigmoid(reference_points)
        new_reference_points = new_reference_points.sigmoid()
    else:
        assert reference_points.shape[-1] == 2
        new_reference_points = tmp
        new_reference_points[..., :2] = tmp[..., :2] + inverse_sigmoid(reference_points)
        new_reference_points = new_reference_points.sigmoid()
    reference_points = new_reference_points.detach()

if self.return_intermediate:
    intermediate.append(output)
    intermediate_reference_points.append(reference_points)

motr.py +542

for lvl in range(hs.shape[0]):
    if lvl == 0:
        reference = init_reference
    else:
        reference = inter_references[lvl - 1]
    reference = inverse_sigmoid(reference)
    outputs_class = self.class_embed[lvl](hs[lvl])
    tmp = self.bbox_embed[lvl](hs[lvl])
    if reference.shape[-1] == 4:
        tmp += reference
    else:
        assert reference.shape[-1] == 2
        tmp[..., :2] += reference
    outputs_coord = tmp.sigmoid()
    outputs_classes.append(outputs_class)
    outputs_coords.append(outputs_coord)
chzhan commented 1 year ago

感觉应该是一个BUG,按代码流程,decoder里外各做一次,self.bbox_embed和hs是一样的,相当于只需在decoder外实现

reference = inverse_sigmoid(reference)
tmp = self.bbox_embed[lvl](hs[lvl])
tmp = reference + 2*tmp
outputs_coord = tmp.sigmoid()

尽管如此训练且如此推理可能也没有大问题

zyayoung commented 1 year ago

这部分是保留的Deformable Detr实现。 我们也不知道为什么原repo要计算两遍,可能是因为iterative bounding box refinement是后来加入的功能。