Evaluation metrics for partnet dataset is different from original paper

lightaime / deep_gcns_torch

Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org

MIT License

1.13k stars 155 forks source link

Evaluation metrics for partnet dataset is different from original paper #80

Closed Hippogriff closed 3 years ago

Hippogriff commented 3 years ago

The evaluation metrics for partnet dataset experiments is different from original paper: https://github.com/daerduoCarey/partnet_seg_exps/blob/master/exps/sem_seg_pointcnn/test_general_seg.py#L114

There are a few points:

Second, the computation of shape IOU from your method will be wrong when batch size is greater than 1, as (i_1 + i_2) / (u_1 + u_2) is not equal to i_1 / u_1 + i_2 / u_2, where i is intersection and u is union, which is what your code is doing here

    I = np.sum(np.logical_and(cur_pred_mask, cur_gt_mask), dtype=np.float32)
    U = np.sum(np.logical_or(cur_pred_mask, cur_gt_mask), dtype=np.float32)
    cur_shape_iou_tot += I / U

Finally, imagine you have batch size 2, and you are checking whether U>0, it might be that for one shape union is non zero and for another it is zero. But your code with include points from both shapes in calculation. This will change the part mIOU metric also.

Please let me know if my understanding is not correct.

Hippogriff commented 3 years ago

I think the test function should be like this:

def test(model, loader, opt):
    part_intersect = np.zeros(opt.n_classes, dtype=np.float32)
    part_union = np.zeros(opt.n_classes, dtype=np.float32)
    model.eval()

    shape_iou_tot = 0.
    shape_iou_cnt = 0.

    for i, data in enumerate(tqdm(loader)):

        data = data.to(opt.device)
        inputs = data.pos.transpose(2, 1).unsqueeze(3)
        gt = data.y
        with torch.no_grad():
            out = model(inputs.detach())
        pred = out.max(dim=1)[1]
        batch_size = pred.shape[0]
        for b in range(batch_size):
            pred_np = pred[b].cpu().numpy()
            target_np = gt[b].cpu().numpy()

            cur_shape_iou_tot = 0.0
            cur_shape_iou_cnt = 0

            for cl in range(opt.n_classes):
                cur_gt_mask = (target_np == cl)
                cur_pred_mask = (pred_np == cl)

                I = np.sum(np.logical_and(cur_pred_mask, cur_gt_mask), dtype=np.float32)
                U = np.sum(np.logical_or(cur_pred_mask, cur_gt_mask), dtype=np.float32)

                if U > 0: # or if U > 0 or I > 0:
                    part_intersect[cl] += I
                    part_union[cl] += U

                    cur_shape_iou_tot += I / U
                    cur_shape_iou_cnt += 1.

            if cur_shape_iou_cnt > 0:
                cur_shape_miou = cur_shape_iou_tot / cur_shape_iou_cnt
                shape_iou_tot += cur_shape_miou
                shape_iou_cnt += 1.

    shape_mIoU = shape_iou_tot / shape_iou_cnt
    part_iou = np.divide(part_intersect[1:], part_union[1:])
    mean_part_iou = np.nanmean(part_iou)
    return mean_part_iou, shape_mIoU

guochengqian commented 3 years ago

Dear @Hippogriff Thank you very much for pointing this out. Sorry for the mistake. The batch size does affect the shape IOU. In the experiments of our paper, the reported metric is part_iou, which is not influenced by the number of batch size.
I have modified the code according to your suggestion.