Open shivahanifi opened 10 months ago
There are 3 different validation codes available in the MDETR repo.
Parameters/Evaluation code | eval_clevr.py | eval_gqa.py | eval_lvis.py |
---|---|---|---|
Notes | This script allows to dump the model's prediction on an arbitrary split of CLEVR/CoGenT/CLEVR-Humans | evaluates the model's predictions on the GQA (Visual Question Answering) dataset and dump the results to a file. | evaluation script for performing object detection evaluation on the LVIS (Large Vocabulary Instance Segmentation) dataset |
The initial analysis shows that it is better to use the eval_lvis.py
as it is indicated for detection. However, the others are mostly for visual question answering and the evaluation metrics are based on the type of the answers, etc.
./GazeMDETR/datasets
. The created file can follow the structure of flicker.py
or vg.py
. --resume
from the model used in the demo code.using the GazeMDETR_eval.py file try to calculate the iou and recall. (hints from flickr_eval.py)
Considering the information mentioned above, the recall metric will be used. However, since with the GazeMDETR we aim to have only one bounding box as an output (choosing the one with the highest confidence level) only recall@1 will be considered with the adjustable IoU threshold.
Since the label for the detection is chosen from the given prompt, and considering the prompt categories we have with only mentioning one object in the sentence, there would be no cases with wrong labels. Moreover, since the model will always provide at least one prediction, the probability of having no bounding box is also eliminated. Therefore, the false negatives can be considered as the cases with IoU less than threshold_IOU.
from the flickr_eval.py
the computations for recall@k can be extracted:
ious = box_iou(np.asarray(cur_boxes), np.asarray(target_boxes))
for k in self.topk:
maxi = 0
if k == -1:
maxi = ious.max()
else:
assert k > 0
maxi = ious[:k].max()
if maxi >= self.iou_thresh:
recall_tracker.add_positive(k, "all")
for phrase_type in phrase["phrase_type"]:
recall_tracker.add_positive(k, phrase_type)
else:
recall_tracker.add_negative(k, "all")
for phrase_type in phrase["phrase_type"]:
recall_tracker.add_negative(k, phrase_type)
Since there are modifications applied to the demo code to run it on the test data, the evaluation is integrated also in the demo code, using different flags for different use cases.
parser = argparse.ArgumentParser(description='Caption format selection and evaluation selection')
parser.add_argument('-cc', '--caption_category', type=str, choices=['A', 'B', 'C', 'D', 'E'], default='A', help='Specify a value (A, B, C, D, E) to determine the caption category. A:The, B:This is a, C:Look at the, D:Point at the, E:Pass the')
parser.add_argument('-cd', '--captrrrion_details', type=int, choices=[1, 2, 3, 4], default=1, help='Specify a detail level as (1, 2, 3, 4) to determine the caption details. 1:pose+color+name+placement, 2:pose+name+placement, 3:color+name, 4:name')
parser.add_argument('-eval', '--evaluate', type=bool, default=True, help='Specify if you want to evaluate the output in terms of iou')
parser.add_argument('-sf', '--save_figures', type=bool, default=True, help='Specify if you want to save the generated figures for heatmaps and final selections')
parser.add_argument('-vf', '--visualize_figures', type=bool, default=True, help='Specify if you want to visualize the generated figures for heatmaps and final selections')
parser.add_argument('-iou', '--iou_thresh', type=float, default=0.5, help='Specify the IoU threshold for the evaluations')
args=parser.parse_args()
All the required functions are retrieved from flickr_eval.py
, modified for the needs of GazeMDETR and collected in GazeMDETR_eval_util.py
Check the instructions on the MDETR repo to find the proper validation code.