suprosanna / relationformer

Apache License 2.0
107 stars 15 forks source link

Issue about the evaluation #12

Open JosephChenHub opened 1 year ago

JosephChenHub commented 1 year ago

Hi, I would first like to thanks your contribution. However, I think there exist some issues of the evaluation.

        if self.mode=='sgdet':
            iou_overlap = bbox_overlaps(gt_boxes, pred_boxes)
            if self.use_gt_filter:
                idx = torch.where(iou_overlap >= 0.5)
                valid_rels_idx = np.asarray([i for i, rel in enumerate(pred_rels) if (rel[0] in idx[1]) and (rel[1] in idx[1])]) #filter the junk detections
                if len(valid_rels_idx)>=1:
                    pred_rels = pred_rels[valid_rels_idx,:]
                    predicate_scores = predicate_scores[valid_rels_idx]
            if self.sort_only_by_rel:
                sorted_rel_idx = np.argsort(predicate_scores)[::-1]
                pred_rels = pred_rels[sorted_rel_idx]
                predicate_scores = predicate_scores[sorted_rel_idx]
        pred_to_gt, pred_5ples, rel_scores, sort_idx = evaluate_recall(
                    gt_rels, gt_boxes, gt_classes,
                    pred_rels, pred_boxes, pred_classes,
                    predicate_scores, obj_scores, phrdet= self.mode=='phrdet', vis=vis,
                    **kwargs)

For "SGDet" mode, you use "use_gt_filter" to filter out those boxes with less IoU overlaps . However, we have no ground truths during test. This is unfair to calculate the recall. And I have no found other codes use this method (https://github.com/rowanz/neural-motifs/blob/master/lib/evaluation/sg_eval.py, https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/master/maskrcnn_benchmark/data/datasets/evaluation/vg/sgg_eval.py).

Could you release the checkpoint ? And the code can not reproduce the same result as reported in the paper. I doubt that the evaluation is unfair and incorrect, and the true performance is not as well as reported in the paper. I have run experiments with your code, and fix some bugs, but it still cannot address my above concerns.