How should the heat map loss be refined for a multi-label object detection problem?

xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:

MIT License

7.29k stars 1.93k forks source link

How should the heat map loss be refined for a multi-label object detection problem? #763

Open Eddiesyn opened 4 years ago

Eddiesyn commented 4 years ago

As in the paper said, 'training objective is a penalty-reduced pixel-wise logistic regression with focal loss'. I would assume this loss can be applied directly in multi-label case? And if it is the case, how should the decode of it be processed?

xingyizhou commented 4 years ago

Yes, it can take care of multi-label by default. No code changes are needed. The decode extract peaks from different channels separately.

Eddiesyn commented 4 years ago

Thank you for your reply.

What if one object is extracted to have different centers (peaks) at different channels (for example, two centers corresponding to two channels are extracted as neighbours) but they actually belong to one object. For single label classification case we could simply choose the higher score one, while for multi-label case this could be very dodgy.

xingyizhou commented 4 years ago

We will produce two objects in this case. As far as I know, Faster RCNN also produces two bounding boxes for the same proposal as long as the score after softmax is larger than 0.05. See the implementation of multi-class NMS in the detectron2 codebase.