Obtaining pose confidence measurements

nianticlabs / mickey

[CVPR 2024 - Oral] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

https://nianticlabs.github.io/mickey/

Other

457 stars 28 forks source link

Obtaining pose confidence measurements #5

Closed tianyilim closed 5 months ago

tianyilim commented 6 months ago

Hi, thanks for the interesting work. We'd like to use this in the context of detecting loop closures for visual SLAM.

As the dataset we are using does not really line up well with what MicKey was trained on (indoor scenes of partially constructed buildings), I don't expect MicKey to perform well out of the box. However, I'd like to somehow quantify "how bad is the domain gap".

In the website, you show a confidence metric between two images:

Is it possible to obtain this number from the pose matcher? The output of the model from demo_inference.py is:

Data keys: dict_keys(['image0', 'image1', 'K_color0', 'K_color1', 'kps0_shape', 
'kps1_shape', 'depth0_map', 'depth1_map', 'down_factor', 'kps0', 'depth_kp0', 
'scr0', 'kps1', 'depth_kp1', 'scr1', 'scores', 'dsc0', 'dsc1', 'kp_scores', 'final_scores', 
'R', 't', 'inliers', 'inliers_list'])

Is the confidence in one of these output variables?

(a more general question: What do all of these mean?)

axelBarroso commented 5 months ago

Hello!

Thank you for your interest in our work!

Yes, the confidence can be extracted from those values. The confidence is computed as the ratio between the inliers and the total number of keypoint correspondences.

'inliers' is the soft-inlier counting. And the total number of keypoint correspondences refers to the number of correspondences that are sampled during the Procrustes pose solver (see this line (NUM_SAMPLED_MATCHES)).

Hence, you can obtain such confidence by doing inliers / NUM_SAMPLED_MATCHES.

All the other values refer to the intermediate elements needed to compute the relative pose (R and t). For instance, depth0_map is the depth map of the reference image (image0), or 'kps0' are the keypoints computed from image0.

Hope this help, please re-open the issue if you have any further questions!

tianyilim commented 5 months ago

Thanks for the info. I found that inliers is a scalar float value with shape (1,1), which I guess is float due to the soft inlier counting.

Meanwhile, inliers_list seems to contain the actual information about the inliers that would be forwarded to the Procustes pose solver. The shape of this is (N, 7), where N is close to the number of inliers (but not equal).

For example:

num_inliers = data['inliers'].cpu().numpy().flatten()[0]
inliers_list = data['inliers_list'][0].cpu().numpy()
confidence = num_inliers / 2048 # or equal to NUM_SAMPLED_MATCHES
print(f"{confidence=}, {num_inliers=}, {inliers_list.shape=}")
# confidence=0.1535889208316803, num_inliers=314.5501, inliers_list.shape=(304, 7)

axelBarroso commented 5 months ago

Yes, exactly.

The inlier_list with shape (N, 7) refers to the total inliers (N), their keypoint coordinates, and their correspondence score. See this line for more details.

Thanks!