Question regarding RMSE computation

Parskatt commented 12 months ago

In the supplementary, the RMSE is written as

While in the code it's computed like this: https://github.com/haoyu94/RoITr/blob/393539d6709c55b2465231cccb7b951f736a5c72/registration/evaluate_registration_c2f_rotated.py#L87

These are not equivalent, as the square root in the code is done before the mean.

Is the correct way the code or the paper?

Parskatt commented 12 months ago

Ok diving deeper into this, it seems no one is actually using RMSE? The approximation used in Predator:

https://github.com/prs-eth/OverlapPredator/blob/770c3063399f08b3836935212ab4c84d355b4704/lib/benchmark.py#L57

Comes from here: http://redwood-data.org/indoor/registration.html

Parskatt commented 12 months ago

Conclusion so far: No one has any idea what metric is being computed lol

Parskatt commented 11 months ago

This is mostly rambling so I'll close it

yaorz97 commented 11 months ago

I am also confused by this point. Many works, such as CoFiNet, GeoTransformer, follow Predator to compute approximation RMSE. However, the evaluation metric RR for rotated 3DMatch seems to be different from 3DMatch?

Parskatt commented 11 months ago

@Pterosaur-Yao yes, although I think the rotated metric makes a lot of sense. Basically the average distance. But of course it makes it hard to compare the rotated and non rotated versions.

yaorz97 commented 11 months ago

@Parskatt hello, have you tested RoITr on rotated 3DMatch with approximation RMSE? I find that computing approximation RMSE requires the relative error between estimated and ground-truth transformations, so we can compute the RR of rotated dataset with the same metric?

p = computeTransformationErr(np.linalg.inv(gt[gt_idx, :, :]) @ pose, gt_info[gt_idx, :, :])

Parskatt commented 11 months ago

@Pterosaur-Yao I haven't tried, no, let me know the results :)

yaorz97 commented 11 months ago

@Parskatt I may test it later because I found a significant drop when testing GeoTransformer on rotated 3DMatch under approximation RMSE.

Parskatt commented 11 months ago

@Pterosaur-Yao that makes sense since roitr increases performance about 3 points on rotated. What numbers did you get for geotransformer, about 68-69?

yaorz97 commented 11 months ago

GeoTrans: 3DMatch RR 92.5 rotated 89.3 3DLoMatch RR 74.2 rotated 67.3

Parskatt commented 11 months ago

@Pterosaur-Yao sounds reasonable, also shows that rotations still matter :D

yaorz97 commented 11 months ago

@haoyu94 @Parskatt , to evaluate Rotated 3DMatch with approximate RMSE, I modified the evaluation file by saving ground-truth trajectory:

    gt_folder = f'configs/benchmarks/{whichbenchmark}'
    exp_dir = f'{exp_dir}/{whichbenchmark}/{n_points}'
    if (not os.path.exists(exp_dir)):
        os.makedirs(exp_dir)
    results = dict()
    results['w_mutual'] = {'inlier_ratios': [], 'distances': []}
    results['wo_mutual'] = {'inlier_ratios': [], 'distances': []}
    tsfm_est = []
    tsfm_gt = []
    inlier_ratio_list = []

    coarse_sample = 256
    idx = 0
    for eachfile in tqdm(desc):

        #if idx < 1320:
        #    idx += 1
        #    continue
        #else:
        #    idx += 1
        ######################################################
        # 1. take the nodes and descriptors
        print(eachfile)
        data = torch.load(eachfile)
        src_pcd, tgt_pcd = data['src_pcd'], data['tgt_pcd']
        src_nodes, tgt_nodes = data['src_nodes'], data['tgt_nodes']
        src_feats, tgt_feats = data['src_node_desc'], data['tgt_node_desc']
        src_point_feats, tgt_point_feats = data['src_point_desc'], data['tgt_point_desc']
        rot, trans = data['rot'], data['trans']
        transform = np.eye(4)
        transform[:3, :3] = rot
        transform[:3, 3:4] = trans

        src_corr_pts, tgt_corr_pts = data['src_corr_pts'], data['tgt_corr_pts']
        confidence = data['confidence']
        ######################################################
        # 2. run ransac
        prob = confidence / torch.sum(confidence)
        print(confidence.shape[0])
        if prob.shape[0] > n_points:
            sel_idx = np.random.choice(prob.shape[0], n_points, replace=False, p=prob.numpy())
            #mute the previous line and unmute the following line for changing the sampling strategy to top-k
            #sel_idx = torch.topk(confidence, k=n_points)[1]
            src_corr_pts, tgt_corr_pts = src_corr_pts[sel_idx], tgt_corr_pts[sel_idx]
            confidence = confidence[sel_idx]

        correspondences = torch.from_numpy(np.arange(src_corr_pts.shape[0])[:, np.newaxis]).expand(-1, 2)
        tsfm_est.append(ransac_pose_estimation_correspondences(src_corr_pts, tgt_corr_pts, correspondences))
        tsfm_gt.append(transform)
        ######################################################
        # 3. calculate inlier ratios
        cur_inlier_ratio = get_inlier_ratio_correspondence(src_corr_pts, tgt_corr_pts, rot, trans, inlier_distance_threshold=0.1)
        inlier_ratio_list.append(cur_inlier_ratio)
        idx += 1

    tsfm_est = np.array(tsfm_est)
    tsfm_gt = np.array(tsfm_gt)
    ########################################
    # wirte the estimated trajectories
    write_est_trajectory(gt_folder, exp_dir, tsfm_est)
    write_est_trajectory(gt_folder, exp_dir, tsfm_gt, 'gt_save.log')

    ########################################
    # evaluate the results, here FMR and Inlier ratios are all average twice
    inlier_ratio_list = np.array(inlier_ratio_list)
    benchmark(exp_dir, gt_folder)
    split = get_scene_split(whichbenchmark)

    inliers = []
    fmrs = []
    inlier_ratio_thres = 0.05
    for ele in split:
        c_inliers = inlier_ratio_list[ele[0]:ele[1]]
        inliers.append(np.mean(c_inliers))
        fmrs.append((np.array(c_inliers) > inlier_ratio_thres).mean())
    with open(os.path.join(exp_dir, 'result'), 'a') as f:
        f.write(f'Inlier ratio: {np.mean(inliers):.3f} : +- {np.std(inliers):.3f}\n')
        f.write(f'Feature match recall: {np.mean(fmrs):.3f} : +- {np.std(fmrs):.3f}\n')

    f.close()

and loading the saved ground-truth trajectory in "benchmark" function:

    scenes = sorted(os.listdir(gt_folder))
    scene_names = [os.path.join(gt_folder, ele) for ele in scenes]

    re_per_scene = defaultdict(list)
    te_per_scene = defaultdict(list)
    re_all, te_all, precision, recall = [], [], [], []
    n_valids = []

    short_names = ['Kitchen', 'Home 1', 'Home 2', 'Hotel 1', 'Hotel 2', 'Hotel 3', 'Study', 'MIT Lab']
    with open(f'{est_folder}/result', 'w') as f:
        f.write(("Scene\t¦ prec.\t¦ rec.\t¦ re\t¦ te\t¦ samples\t¦\n"))

        for idx, scene in enumerate(scene_names):
            # ground truth info
            # gt_pairs, gt_traj = read_trajectory(os.path.join(scene, "gt.log"))
            gt_pairs, gt_traj = read_trajectory(os.path.join(est_folder, scenes[idx], "gt_save.log"))
            n_valid = 0
            for ele in gt_pairs:
                diff = abs(int(ele[0]) - int(ele[1]))
                n_valid += diff > 1
            n_valids.append(n_valid)

            n_fragments, gt_traj_cov = read_trajectory_info(os.path.join(scene, "gt.info"))

            # estimated info
            est_pairs, est_traj = read_trajectory(os.path.join(est_folder, scenes[idx], 'est.log'))

            temp_precision, temp_recall, c_flag = evaluate_registration(n_fragments, est_traj, est_pairs, gt_pairs,
                                                                        gt_traj, gt_traj_cov)

            # Filter out the estimated rotation matrices
            ext_gt_traj = extract_corresponding_trajectors(est_pairs, gt_pairs, gt_traj)
            re = rotation_error(torch.from_numpy(ext_gt_traj[:, 0:3, 0:3]),
                                torch.from_numpy(est_traj[:, 0:3, 0:3])).cpu().numpy()[np.array(c_flag) == 0]
            te = translation_error(torch.from_numpy(ext_gt_traj[:, 0:3, 3:4]),
                                   torch.from_numpy(est_traj[:, 0:3, 3:4])).cpu().numpy()[np.array(c_flag) == 0]

            re_per_scene['mean'].append(np.mean(re))
            re_per_scene['median'].append(np.median(re))
            re_per_scene['min'].append(np.min(re))
            re_per_scene['max'].append(np.max(re))

            te_per_scene['mean'].append(np.mean(te))
            te_per_scene['median'].append(np.median(te))
            te_per_scene['min'].append(np.min(te))
            te_per_scene['max'].append(np.max(te))

            re_all.extend(re.reshape(-1).tolist())
            te_all.extend(te.reshape(-1).tolist())

            precision.append(temp_precision)
            recall.append(temp_recall)

            f.write("{}\t¦ {:.3f}\t¦ {:.3f}\t¦ {:.3f}\t¦ {:.3f}\t¦ {:3d}¦\n".format(short_names[idx], temp_precision,
                                                                                    temp_recall, np.median(re),
                                                                                    np.median(te), n_valid))
            np.save(f'{est_folder}/{scenes[idx]}/flag.npy', c_flag)

        weighted_precision = (np.array(n_valids) * np.array(precision)).sum() / np.sum(n_valids)

        f.write("Mean precision: {:.3f}: +- {:.3f}\n".format(np.mean(precision), np.std(precision)))
        f.write("Weighted precision: {:.3f}\n".format(weighted_precision))

        f.write("Mean median RRE: {:.3f}: +- {:.3f}\n".format(np.mean(re_per_scene['median']),
                                                              np.std(re_per_scene['median'])))
        f.write("Mean median RTE: {:.3F}: +- {:.3f}\n".format(np.mean(te_per_scene['median']),
                                                              np.std(te_per_scene['median'])))
    f.close()

The experimental results (with 5000 points) are 3DMatch (0.91)-> Rotated 3DMatch (0.89) and 3DLoMatch (0.737)-> Rotated 3DLoMatch (0.709), and is it reasonable?
3DMatch

3DLoMatch

Rotated 3DMatch

Rotated 3DLoMatch

Parskatt commented 11 months ago

It seems pretty clear that the inlier ratio is identical, and yet the estimation gives worse results? Seems a little strange to me. I'm new to registration though, so can't really say what's reasonable.

yaorz97 commented 11 months ago

The results on 3DMatch and 3DLoMatch are evaluated by the original codes, and the results on rotated datasets are evaluated by modified codes. I also observed margin drops on the 3DMatch and 3DLoMatch datasets. Can you reproduce the results reported in paper? @Parskatt

Parskatt commented 11 months ago

@Pterosaur-Yao I haven't tried as of yet. I guess it's often the case that the official code has some small issues thst causes numbers not to reproduce exactly.

Parskatt commented 10 months ago

@Pterosaur-Yao For me, using the pretrained model only on the validation set also produces somewhat worse results than what the pretrained model states it has achieved.

Result reported by model:

Successfully load pretrained model from pretrained/model_3dmatch.pth! Current best loss 3.092932455174558 Current best c_loss 0.6709955289915159 Current best f_loss 2.1981095421063648 Current best PIR 0.684684693813324 Current best IR 0.709432065486908

Eval results:

Epoch: 0 loss: 3.2571 c_loss: 0.7007 f_loss: 2.5564 o_loss: 0.0000 PIR: 0.6680 IR: 0.6930

There might be some discrepancy on the exact eval params?

haoyu94 / RoITr

Question regarding RMSE computation #5