shivahanifi / GazeMDETR

Apache License 2.0
0 stars 0 forks source link

Choose Validation code from MDETR repo and adjust input data accordingly #17

Open shivahanifi opened 10 months ago

shivahanifi commented 10 months ago

Check the instructions on the MDETR repo to find the proper validation code.

  1. Adapt the test data to the evaluation code
  2. Make sure to use the right model (how to call the model, etc.)
shivahanifi commented 9 months ago

There are 3 different validation codes available in the MDETR repo.

Parameters/Evaluation code eval_clevr.py eval_gqa.py eval_lvis.py
Notes This script allows to dump the model's prediction on an arbitrary split of CLEVR/CoGenT/CLEVR-Humans evaluates the model's predictions on the GQA (Visual Question Answering) dataset and dump the results to a file. evaluation script for performing object detection evaluation on the LVIS (Large Vocabulary Instance Segmentation) dataset

The initial analysis shows that it is better to use the eval_lvis.py as it is indicated for detection. However, the others are mostly for visual question answering and the evaluation metrics are based on the type of the answers, etc.

Actions to be taken

  1. Create an specific python script for the test set as the files in ./GazeMDETR/datasets . The created file can follow the structure of flicker.py or vg.py.
  2. In the evaluation code disable the training part and make it --resume from the model used in the demo code.
shivahanifi commented 9 months ago

using the GazeMDETR_eval.py file try to calculate the iou and recall. (hints from flickr_eval.py)

shivahanifi commented 9 months ago

important parts from the paper:

shivahanifi commented 9 months ago

Evaluation metric to be used for GazeMDETR

Considering the information mentioned above, the recall metric will be used. However, since with the GazeMDETR we aim to have only one bounding box as an output (choosing the one with the highest confidence level) only recall@1 will be considered with the adjustable IoU threshold.

Since the label for the detection is chosen from the given prompt, and considering the prompt categories we have with only mentioning one object in the sentence, there would be no cases with wrong labels. Moreover, since the model will always provide at least one prediction, the probability of having no bounding box is also eliminated. Therefore, the false negatives can be considered as the cases with IoU less than threshold_IOU.

QUESTION: isn't recall equally to precision in this case?

shivahanifi commented 9 months ago

Code for evaluation

Since there are modifications applied to the demo code to run it on the test data, the evaluation is integrated also in the demo code, using different flags for different use cases.

parser = argparse.ArgumentParser(description='Caption format selection and evaluation selection')
parser.add_argument('-cc', '--caption_category', type=str, choices=['A', 'B', 'C', 'D', 'E'], default='A', help='Specify a value (A, B, C, D, E) to determine the caption category. A:The, B:This is a, C:Look at the, D:Point at the, E:Pass the')
parser.add_argument('-cd', '--captrrrion_details', type=int, choices=[1, 2, 3, 4], default=1, help='Specify a detail level as (1, 2, 3, 4) to determine the caption details. 1:pose+color+name+placement, 2:pose+name+placement, 3:color+name, 4:name')
parser.add_argument('-eval', '--evaluate', type=bool, default=True, help='Specify if you want to evaluate the output in terms of iou')
parser.add_argument('-sf', '--save_figures', type=bool, default=True, help='Specify if you want to save the generated figures for heatmaps and final selections')
parser.add_argument('-vf', '--visualize_figures', type=bool, default=True, help='Specify if you want to visualize the generated figures for heatmaps and final selections')
parser.add_argument('-iou', '--iou_thresh', type=float, default=0.5, help='Specify the IoU threshold for the evaluations')
args=parser.parse_args()

All the required functions are retrieved from flickr_eval.py, modified for the needs of GazeMDETR and collected in GazeMDETR_eval_util.py