naver / mast3r

Grounding Image Matching in 3D with MASt3R
Other
1.34k stars 101 forks source link

Minimal example for Absolute Pose Estimation #29

Open tianyilim opened 3 months ago

tianyilim commented 3 months ago

Hi, I would like to sanity-check using mast3r for absolute pose estimation.

Here is some code adapted from the example usage in the Readme

matches_im0, matches_im1, pts3d_im0, pts3d_im1, conf_im0, conf_im1, desc_conf_im0, desc_conf_im1 = \
    get_mast3r_output(...) # call mast3r example script with custom images, all variables are np arrays

# use PnP from opencv
retval, rvec, tvec = cv2.solvePnP(
        objectPoints=pts3d_im0[matches_im0[:, 1], matches_im0[:, 0], :],
        imagePoints=matches_im1.astype(np.float32),    # ensure same datatype for opencv
        cameraMatrix=scale_K,                          # scale original intrinsics to Mast3r output resolution (512, 384)
        distCoeffs=np.zeros((0,)),                     # No distortion assumed
        flags=cv2.SOLVEPNP_EPNP)

Is this the recommended way to extract absolute camera poses between the two images in Mast3r?

piuabramo commented 3 months ago

Hi @tianyilim, I'm working on the same function.

Can you share the get_mast3r_output full code and how to calculate scale_K value?

Thank you!

yocabon commented 3 months ago

In visloc.py, it's not done like this, though it could, especially if you use mast3r to estimate the intrinsics. Instead, it's rescaling the matches back to the original size and running pnp with the full img intrinsics.

with cv2, we use cv2.SOLVEPNP_SQPNP, and you may want to remove low confidence matches.

tianyilim commented 3 months ago

@piuabramo Here you go:

I lightly touched the code in the README for get_mast3r_output. Of course, this script has to be placed in the Mast3r repository root.

get_mast3r_output function ```python3 import numpy as np import mast3r.utils.path_to_dust3r from dust3r.inference import inference from dust3r.utils.image import load_images from mast3r.fast_nn import fast_reciprocal_NNs from mast3r.model import AsymmetricMASt3R DEVICE = 'cuda' MODEL_NAME = "naver/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric" BORDER = 3 def get_mast3r_output(): # Load model, run inference model = AsymmetricMASt3R.from_pretrained(MODEL_NAME).to(DEVICE) images = load_images([args.img0, args.img1], size=512) output = inference([tuple(images)], model, DEVICE, batch_size=1, verbose=False) # raw predictions view1, pred1 = output['view1'], output['pred1'] view2, pred2 = output['view2'], output['pred2'] desc1 = pred1['desc'].squeeze(0).detach() desc2 = pred2['desc'].squeeze(0).detach() # find 2D-2D matches between the two images matches_im0, matches_im1 = fast_reciprocal_NNs(desc1, desc2, subsample_or_initxy1=8, device=DEVICE, dist='dot', block_size=2**13) # ignore small border around the edge H0, W0 = view1['true_shape'][0] valid_matches_im0 = (matches_im0[:, 0] >= BORDER) & \ (matches_im0[:, 0] < int(W0) - BORDER) & \ (matches_im0[:, 1] >= BORDER) & \ (matches_im0[:, 1] < int(H0) - BORDER) H1, W1 = view2['true_shape'][0] valid_matches_im1 = (matches_im1[:, 0] >= BORDER) & \ (matches_im1[:, 0] < int(W1) - BORDER) & \ (matches_im1[:, 1] >= BORDER) & \ (matches_im1[:, 1] < int(H1) - BORDER) valid_matches = valid_matches_im0 & valid_matches_im1 # matches are Nx2 image coordinates. matches_im0 = matches_im0[valid_matches] matches_im1 = matches_im1[valid_matches] # Convert the other outputs to numpy arrays pts3d_im0 = pred1['pts3d'].squeeze(0).detach().cpu().numpy() pts3d_im1 = pred2['pts3d_in_other_view'].squeeze(0).detach().cpu().numpy() conf_im0 = pred1['conf'].squeeze(0).detach().cpu().numpy() conf_im1 = pred2['conf'].squeeze(0).detach().cpu().numpy() desc_conf_im0 = pred1['desc_conf'].squeeze(0).detach().cpu().numpy() desc_conf_im1 = pred2['desc_conf'].squeeze(0).detach().cpu().numpy() return matches_im0, matches_im1, pts3d_im0, pts3d_im1, conf_im0, conf_im1, desc_conf_im0, desc_conf_im1 ```

And for scale_K, I already know the intrinsics of my image. So it's just a matter of ensuring the sensor_size matches the Mast3r outputs.

scale_intrinsics function ```python3 def scale_intrinsics(K: NDArray, prev_w: float, prev_h: float) -> NDArray: """Scale the intrinsics matrix by a given factor . Args: K (NDArray): 3x3 intrinsics matrix scale (float): Scale factor Returns: NDArray: Scaled intrinsics matrix """ assert K.shape == (3, 3), f"Expected (3, 3), but got {K.shape=}" scale_w = 512.0 / prev_w # sizes of the images in the Mast3r dataset scale_h = 384.0 / prev_h # sizes of the images in the Mast3r dataset K_scaled = K.copy() K_scaled[0, 0] *= scale_w K_scaled[0, 2] *= scale_w K_scaled[1, 1] *= scale_h K_scaled[1, 2] *= scale_h return K_scaled ```

Hope this helps! :smile:

tianyilim commented 3 months ago

@yocabon Thanks for the input!

Out of curiosity, is scaling the matches back to original size more accurate than doing PnP at mast3r scale? I would think there's a trade-off in the noise of the keypoint locations, if the mast3r resolution is lower than original input resolution.

I will indeed try the SQPNP method and add some low-confidence match filtering as you suggested.