Closed sisrfeng closed 3 years ago
multi-task learning by itself (SpotNet No Attention) helps to be more precise, but does not help to detect more objects, i.e. to reach improved values of recall. On the other hand, the attention mechanism does both, it helps to be even more precise for the same values of recall (fewer false positives), and it also allows the model to detect more and reach significantly higher values of recall.
Since the network is looking for keypoints on the whole image, it is natural that concentrating the search on learned foreground pixels will increase the probability that the key points found belong to the objects of interest, thus reducing the rate of false positives. Furthermore, the experiments show that this increases recall because the network can concentrate on useful information.
I think concentrating on useful information will improve the precision, but if the segmentation map predicted by the segmentation head is not so good as the ground truth and the segmentation map make the value of some of the predicted keypoints lower, for example, the value of every pixel of the segmentation map is 0, then detection is bad.
I think we can say that for very high values of recall on the curve, the precision becomes bad. The idea is that with the attention map, this "bad" precision still hits more true positive than without it, due to the model being able to concentrate on better areas. This is mainly for detections with a very low confidence score. In our paper, the datasets we use do not use coco AP and coco AR, but rather the precision over recall curve.
I think the attention map make the value of some of the predicted keypoints lower , which means the detection is more strict than CenterNet. Will the Average Recall become lower? I do not well understand the difference between coco AP and AR ? Would you mind sharing your explaination? Many thanks!