Closed fengxiuyaun closed 1 year ago
当gtboxes为none或为噪声的GT, frame_res['pred_logits']值差别很大。
那里说里边有attn_mask。frame_res['pred_logits']前面的几个query'对应输出的置信度应该差不多才对。但是现在差别很多。感觉噪声GT影响了前面几个query
当gtboxes为none或为噪声的GT, frame_res['pred_logits']值差别很大。
那里说里边有attn_mask。frame_res['pred_logits']前面的几个query'对应输出的置信度应该差不多才对。但是现在差别很多。感觉噪声GT影响了前面几个query
这个地方是自注意力,能用atta_mask嘛?作者代码是不是搞错了
当gtboxes为none或为噪声的GT, frame_res['pred_logits']值差别很大。
感谢提问,能否详细描述一下复现的流程和代码?我们刚刚检查过
这两点是符合预期的,在自注意力上加atta_mask是参考的DN-DETR。
代码
hs, init_reference, inter_references, enc_outputs_class, enc_outputs_coord_unact = \
self.transformer(srcs, masks, pos, query_embed, ref_pts=ref_pts,
mem_bank=track_instances.mem_bank, mem_bank_pad_mask=track_instances.mem_padding_mask, attn_mask=attn_mask)
query_embed[-1].add_(-10000)
print(hs[-1])
breakpoint()
hs, init_reference, inter_references, enc_outputs_class, enc_outputs_coord_unact = \
self.transformer(srcs, masks, pos, query_embed, ref_pts=ref_pts,
mem_bank=track_instances.mem_bank, mem_bank_pad_mask=track_instances.mem_padding_mask, attn_mask=attn_mask)
print(hs[-1])
breakpoint()
结果
-> self.transformer(srcs, masks, pos, query_embed, ref_pts=ref_pts,
(Pdb) c
tensor([[[-0.0473, -0.1809, -0.1649, ..., -0.3847, 0.4224, -0.1286],
[-0.0995, -0.0424, 0.1443, ..., -0.4425, 0.0478, -0.1900],
[ 0.0369, 0.0160, 0.1610, ..., -0.4706, -0.9238, -0.2897],
...,
[ 0.2081, -0.1846, -0.3163, ..., -0.5717, 0.3413, -0.0820],
[-0.1503, -0.0813, -0.4625, ..., -0.0906, 0.6473, -0.1895],
[-0.1096, -0.1370, -0.3655, ..., -0.2530, 1.3197, -0.1997]]],
device='cuda:0')
> /data/projects/motr_eccvw_to_publish/models/motr.py(544)_forward_single_image()
-> self.transformer(srcs, masks, pos, query_embed, ref_pts=ref_pts,
(Pdb) c
tensor([[[-0.0473, -0.1809, -0.1649, ..., -0.3847, 0.4224, -0.1286],
[-0.0995, -0.0424, 0.1443, ..., -0.4425, 0.0478, -0.1900],
[ 0.0369, 0.0160, 0.1610, ..., -0.4706, -0.9238, -0.2897],
...,
[ 0.2383, -0.1272, -0.1161, ..., -0.6548, -0.1152, -0.1674],
[-0.0791, -0.0713, -0.2048, ..., -0.2818, 0.4432, -0.0805],
[-0.1542, -0.1183, -0.5157, ..., -0.2627, 1.2313, -0.2040]]],
device='cuda:0')
> /data/projects/motr_eccvw_to_publish/models/motr.py(549)_forward_single_image()
@zyayoung 谢谢作者回复。
修改代码前: self._forward_single_image(frame, tmp, None)['pred_logits'] tensor([[[-3.2966], [-4.3049], [-4.5905], [-3.0246], [-3.3853], [-3.2107], [-3.1802], [-2.6938], [-2.6116], [-3.3340], [-2.4464], [-2.4135], [-2.3779]]], device='cuda:0') self._forward_single_image(frame, tmp, gtboxes)['pred_logits'] tensor([[[-6.0443], [-6.4851], [-6.7420], [-6.3519], [-5.3412], [-6.6307], [-6.4254], [-6.2463], [-6.3488], [-5.3368], [ 3.7604], [ 3.5694], [ 3.9735], [-5.1342], [-5.3880], [-5.1033]]], device='cuda:0')
我通过https://github.com/megvii-research/MOTRv2/blob/be49b7336218e470c9ebcd34be54fe7eec702675/models/deformable_transformer_plus.py#L356修改代码后:( def _forward_self_cross(self, tgt, query_pos, reference_points, src, src_spatial_shapes, level_start_index, src_padding_mask=None, attn_mask=None):
# self attention
if attn_mask is not None:
len_n_dt = sum(attn_mask[0]==False)
tgt = torch.cat([self._forward_self_attn(tgt[:, :len_n_dt], query_pos[:, :len_n_dt]), self._forward_self_attn(tgt[:, len_n_dt:], query_pos[:, len_n_dt:])], dim=1)
else:
tgt = self._forward_self_attn(tgt, query_pos, attn_mask)
# cross attention)
self._forward_single_image(frame, tmp, gtboxes)['pred_logits'] tensor([[[-3.2966], [-4.3049], [-4.5905], [-3.0246], [-3.3853], [-3.2107], [-3.1802], [-2.6938], [-2.6116], [-3.3340], [-2.4464], [-2.4135], [-2.3779], [-1.8376], [-2.1647], [-1.9602]]], device='cuda:0') self._forward_single_image(frame, tmp, None)['pred_logits'] tensor([[[-3.2966], [-4.3049], [-4.5905], [-3.0246], [-3.3853], [-3.2107], [-3.1802], [-2.6938], [-2.6116], [-3.3340], [-2.4464], [-2.4135], [-2.3779]]], device='cuda:0')
我认为是自注意力模块不能使用atta_mask。原因:自注意力模型每个Q的输出与所有K和V相关(atta_mask仅能屏蔽其他Q值,但屏蔽不到其他K和V值)
不好意思,这个bug发现了。这是因为我使用的是pytroch1.5. pytorch1.5 attn_mask输入必须是float类型,不能是bool型
https://github.com/megvii-research/MOTRv2/blob/be49b7336218e470c9ebcd34be54fe7eec702675/models/motr.py#L672
当gtboxes为none或为噪声的GT, frame_res['pred_logits']值差别很大。