question on evaluation metrics

thsant / wgisd

Embrapa Wine Grape Instance Segmentation Dataset - Embrapa WGISD

Other

123 stars 39 forks source link

question on evaluation metrics #9

Closed andreaceruti closed 2 years ago

andreaceruti commented 2 years ago

As you, I am working on grape bunch detection and tracking using Detectron2 framework but the avaliable evaluators for instance segmentation task contains metrics as Average Precision calculated at different IoU tresholds and also Average Recall. It would really help me if you can share some information about your f1 metric implementation for instance segmentation.

thsant commented 2 years ago

Dear @andreaceruti , look section 3.4 in https://arxiv.org/abs/1907.11819 - the explanation is there. See also Table 3 and the different values for IoU.

andreaceruti commented 2 years ago

@thsant Yes I saw that section, my question is referred to the actual code implementation. I am struggling with cocoapi because the documentation is poor, and so I would like to know if you used a particular API to calculate Pinst, Rinst, F1.

Edit: from what I have understand, in instance segmentation standard metrics we have as principal metrics the Average Precision, Average Recall, and we can have an Average F1 calculated with AP and AR. I think this differs a lot from what you present in your paper. And I have not found standard implementation where we can calculate Pecision and Recall at an instance level as your work (I'm talking about instance segmentation task). So I was wondering if you can give better explanation on what you used in your paper to use it as state-of-the-art

thsant commented 2 years ago

@andreaceruti , differently from COCO, VOC or other computer vision benchmarks, here our focus is in yield, so we are interested in evaluating the total number of grapes found and lost. So, true positives number (TP), false positives number (FP) and false negatives number (FN) are computed considering all the 408 clusters (so the detections and losses are integrated over all the 27 images).

So, to compare to our results, you have to consider TP, FN and FP for the entire set of 408 clusters and then compute Precision and Recall. The F1 will be the harmonic mean of Precision and Recall.

thsant commented 2 years ago

Also consider Matterport utils for Mask R-CNN (we have used it in our work):

https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/utils.py

Consider compute_matches and compute_ap.