hu64 / SpotNet

Repository for the paper SpotNet: Self-Attention Multi-Task Network for Object Detection
MIT License
51 stars 10 forks source link

The Average Recall is lower then CenterNet. Is the reason that the small values( nearly 0) on the attention map weaken some keypoint? #2

Closed sisrfeng closed 3 years ago

sisrfeng commented 4 years ago

I think the attention map make the value of some of the predicted keypoints lower , which means the detection is more strict than CenterNet. Will the Average Recall become lower? I do not well understand the difference between coco AP and AR ? Would you mind sharing your explaination? Many thanks!

sisrfeng commented 4 years ago

multi-task learning by itself (SpotNet No Attention) helps to be more precise, but does not help to detect more objects, i.e. to reach improved values of recall. On the other hand, the attention mechanism does both, it helps to be even more precise for the same values of recall (fewer false positives), and it also allows the model to detect more and reach significantly higher values of recall.

Since the network is looking for keypoints on the whole image, it is natural that concentrating the search on learned foreground pixels will increase the probability that the key points found belong to the objects of interest, thus reducing the rate of false positives. Furthermore, the experiments show that this increases recall because the network can concentrate on useful information.

I think concentrating on useful information will improve the precision, but if the segmentation map predicted by the segmentation head is not so good as the ground truth and the segmentation map make the value of some of the predicted keypoints lower, for example, the value of every pixel of the segmentation map is 0, then detection is bad.

hu64 commented 4 years ago

I think we can say that for very high values of recall on the curve, the precision becomes bad. The idea is that with the attention map, this "bad" precision still hits more true positive than without it, due to the model being able to concentrate on better areas. This is mainly for detections with a very low confidence score. In our paper, the datasets we use do not use coco AP and coco AR, but rather the precision over recall curve.