junjie18 / CMT

[ICCV 2023] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
Other
308 stars 34 forks source link

About decode bbox and SeparateTask head. #85

Open xu19971109 opened 8 months ago

xu19971109 commented 8 months ago
  1. I find that loss uses -1 layer output as the main loss, and at eval only uses transformer -1 layers output to convert the bboxes. Are the pre 5 layers of the transformer only auxiliary loss?
  2. Does the SeparateTask only use one task to predict all class? I find that the output seems to predict all classes by one group.Have you experimented with centerhead's original groups?
junjie18 commented 8 months ago

@xu19971109

  1. Yes.
  2. Yes, we only use one group. If you use multi groups, you should be make some effort on modifying Query Denoise(QD), since QD tends to make GTs to be negative samples.