I find that loss uses -1 layer output as the main loss, and at eval only uses transformer -1 layers output to convert the bboxes.
Are the pre 5 layers of the transformer only auxiliary loss?
Does the SeparateTask only use one task to predict all class? I find that the output seems to predict all classes by one group.Have you experimented with centerhead's original groups?
Yes, we only use one group. If you use multi groups, you should be make some effort on modifying Query Denoise(QD), since QD tends to make GTs to be negative samples.