VLC stereo depth estimation

Hi, first of all thanks for this great repo, it is super helpful!

I am trying to do depth estimation of a single keypoint using both front VLC camera's. I am following the sample (sample_stereo_rectify.py), to undistort and rectify both images. Afterwards for testing purposes I manually pick a keypoint in both images (The location of the points have been verified by drawing circles in the cv2 image.) and use cv2.triangulatepoints. For P1 and P2 I use the saved stereo_rectification file created by the same script.

So I follow this procedure (code below):

Load raw images
Load calibration and rectification files created in sample_stereo_rectify
Undistort and rectify images
Create variables containing the keypoints : pts1 = np.array([x_l , y_l]).T & same for pts2
Triangulate using cv2.triangulatePoints(stereo_rectification.P1, stereo_rectification.P2, pts_1, pts_2)
Converting 4D matric to 3D points: points_3d = points_4d_homogeneous[:3] / points_4d_homogeneous[3]
Get T1 from the left camera to use as reference point
Calculate distance from reference point to keypoint: distance = np.sqrt(np.sum((points_3d - T)**2))

However I get a value of ~3 (I assume in mm), while I was expecting anywhere between 20-50cm. What should I being doing differently? Should I be creating image_to_world matrixes to use as P1 and P2 and then use (0,0,0) as reference point?

For sanity purposes, this is my P2 that comes out of stereo_rectification:

[[     360.56           0      235.95     -35.963]
 [          0      360.56      303.48           0]
 [          0           0           1           0]]

P1 is exactly the same without the translation. I also notice that whenever I change my pts_1 or pts_2 values; the value stays the same. Which makes me expect that something is going wrong with the projection matrixes. However when visually inspecting the images, the rectification has been done correctly.

Minimal reproducible example code

```python import numpy as np import cv2 import hl2ss import hl2ss_3dcv # Calibration folder (must exist but can be empty) calibration_path = 'calibration' host = "offline" # Ports shape = hl2ss.Parameters_RM_VLC.SHAPE port_left = hl2ss.StreamPort.RM_VLC_LEFTFRONT port_right = hl2ss.StreamPort.RM_VLC_RIGHTFRONT calibration_l = hl2ss_3dcv.get_calibration_rm(host, port_left, calibration_path) calibration_r = hl2ss_3dcv.get_calibration_rm(host, port_right, calibration_path) rotation_lf = hl2ss_3dcv.rm_vlc_get_rotation(port_left) rotation_rf = hl2ss_3dcv.rm_vlc_get_rotation(port_right) K1, Rt1 = hl2ss_3dcv.rm_vlc_rotate_calibration(calibration_l.intrinsics, calibration_l.extrinsics, rotation_lf) K2, Rt2 = hl2ss_3dcv.rm_vlc_rotate_calibration(calibration_r.intrinsics, calibration_r.extrinsics, rotation_rf) # Get stereo calibration and rectify -------------------------------------- stereo_calibration = hl2ss_3dcv.rm_vlc_stereo_calibrate(K1, K2, Rt1, Rt2) stereo_rectification = hl2ss_3dcv.rm_vlc_stereo_rectify(K1, K2, stereo_calibration.R, stereo_calibration.t, shape) #load images from files data_left = np.load('saved/left_vlc_raw.npy') data_right = np.load('saved/right_vlc_raw.npy') # Undistort and rectify frames ---------------------------------------- lf_u = cv2.remap(data_left, calibration_l.undistort_map[:, :, 0], calibration_l.undistort_map[:, :, 1], cv2.INTER_LINEAR) rf_u = cv2.remap(data_right, calibration_r.undistort_map[:, :, 0], calibration_r.undistort_map[:, :, 1], cv2.INTER_LINEAR) lf_ru = hl2ss_3dcv.rm_vlc_rotate_image(lf_u, rotation_lf) rf_ru = hl2ss_3dcv.rm_vlc_rotate_image(rf_u, rotation_rf) r1 = cv2.remap(lf_ru, stereo_rectification.map1[:, :, 0], stereo_rectification.map1[:, :, 1], cv2.INTER_LINEAR) r2 = cv2.remap(rf_ru, stereo_rectification.map2[:, :, 0], stereo_rectification.map2[:, :, 1], cv2.INTER_LINEAR) x_l = 275 y_l = 411 x_r = 204 y_r = 411 showimage = True if(showimage): image_l = hl2ss_3dcv.rm_vlc_to_rgb(r1) image_r = hl2ss_3dcv.rm_vlc_to_rgb(r2) cv2.circle(image_l, (x_l, y_l), 2, (0, 255, 0), 2) # Draw a green circle cv2.circle(image_r, (x_r, y_r), 2, (0, 255, 0), 2) cv2.imshow('Left Image', image_l) cv2.imshow('Right Image', image_r) cv2.waitKey(0) cv2.destroyAllWindows() pts_left_T = np.array([[x_l], [y_l]]) pts_right_T = np.array([[x_r], [y_r]]) points_4d_homogeneous = cv2.triangulatePoints(stereo_rectification.P1, stereo_rectification.P2, pts_left_T, pts_right_T) points_3d = points_4d_homogeneous[:3] / points_4d_homogeneous[3] T = (0,0,0) distance = np.sqrt(np.sum((points_3d - T)**2)) print(distance) ```

Hello, For triangulation, you have to use unrectified coordinates. If you want to use rectified coordinates you have to undo the rectification using the Q matrix like in OpenCV. See stereoRectify() in https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html#ga617b1685d4059c6040827800e72ad2b6

Thanks for the clarification @jdibenes, I think I had misinterpreted the P1 output, as it reads that it is the projection matrix in the new rectified frames. I couldn't get it too work with unrectification and decided to directly try triangulation before rectifying, as I can find my keypoints without the need of rectification.

However, this also doesn't seem to work. I think I am misunderstanding the intrinsics/extrinsic values and mixing up row and column major operations. I thought this process would've been fairly easy, and tried many sanity checks and flipped everything around; but all the numbers still don't make sense.

I think in theory I should be getting 'correct' positions by doing this:

# Image to rignode matrix
Pl = hl2ss_3dcv.camera_to_rignode(calibration_l.extrinsics) @ hl2ss_3dcv.image_to_camera(calibration_l.intrinsics)
Pr = hl2ss_3dcv.camera_to_rignode(calibration_r.extrinsics) @ hl2ss_3dcv.image_to_camera(calibration_r.intrinsics)

# Transpose to get the translation in the last column
Pl = Pl.T
Pr = Pr.T

#make it a 3x4 matrix that can be used in triangulatePoints
Pl= Pl[:3,:4]
Pr= Pr[:3,:4]

After which I perform triangulate points on the undistorted images (of which I kept the original rotation):

points_4d_homogeneous  = cv2.triangulatePoints(Pl, Pr, pts_left_T, pts_right_T)
points_3d = cv2.convertPointsFromHomogeneous(points_4d_homogeneous.T)
T = (0,0,0)
distance = np.sqrt(np.sum((points_3d - T)**2))

My keypoints are in cv2 format, in which (0,0) is the top left, and they are not normalized (so every pixel is an integer). The distorted images have been tried too.

Any pointers in the right direction would be hugely appreciated!

Full minimal reproducible example code for reference

```python import numpy as np import cv2 import hl2ss import hl2ss_3dcv # Calibration folder (must exist but can be empty) calibration_path = 'calibration' host = "offline" # Ports shape = hl2ss.Parameters_RM_VLC.SHAPE port_left = hl2ss.StreamPort.RM_VLC_LEFTFRONT port_right = hl2ss.StreamPort.RM_VLC_RIGHTFRONT calibration_l = hl2ss_3dcv.get_calibration_rm(host, port_left, calibration_path) calibration_r = hl2ss_3dcv.get_calibration_rm(host, port_right, calibration_path) #Image to Rignode matrix Pl = hl2ss_3dcv.camera_to_rignode(calibration_l.extrinsics) @ hl2ss_3dcv.image_to_camera(calibration_l.intrinsics) Pr = hl2ss_3dcv.camera_to_rignode(calibration_r.extrinsics) @ hl2ss_3dcv.image_to_camera(calibration_r.intrinsics) # Transpose to get the translation in the last column Pl = Pl.T Pr = Pr.T #make it a 3x4 matrix that can be used in triangulatePoints Pl= Pl[:3,:4] Pr= Pr[:3,:4] #load images from files data_left = np.load('saved/left_vlc_raw.npy') data_right = np.load('saved/right_vlc_raw.npy') # Undistort frames ---------------------------------------- image_l = cv2.remap(data_left, calibration_l.undistort_map[:, :, 0], calibration_l.undistort_map[:, :, 1], cv2.INTER_LINEAR) image_r = cv2.remap(data_right, calibration_r.undistort_map[:, :, 0], calibration_r.undistort_map[:, :, 1], cv2.INTER_LINEAR) x_l = 413 y_l = 200 x_r = 215 y_r = 200 showimage = False if(showimage): image_l = hl2ss_3dcv.rm_vlc_to_rgb(image_l) image_r = hl2ss_3dcv.rm_vlc_to_rgb(image_r) cv2.circle(image_l, (x_l, y_l), 2, (0, 255, 0), 2) # Draw a green circle cv2.circle(image_r, (x_r, y_r), 2, (0, 255, 0), 2) cv2.imshow('Left Image', image_l) cv2.imshow('Right Image', image_r) cv2.waitKey(0) cv2.destroyAllWindows() pts_left_T = np.array([[x_l], [y_l]]) pts_right_T = np.array([[x_r], [y_r]]) points_4d_homogeneous = cv2.triangulatePoints(Pl, Pr, pts_left_T, pts_right_T) points_3d = cv2.convertPointsFromHomogeneous(points_4d_homogeneous.T) T = (0,0,0) distance = np.sqrt(np.sum((points_3d - T)**2)) ```

Here is an example for triangulation on unrectified images:

    # Get camera calibrations -------------------------------------------------
    calibration_lf = hl2ss_3dcv.get_calibration_rm(host, port_left,  calibration_path)
    calibration_rf = hl2ss_3dcv.get_calibration_rm(host, port_right, calibration_path)

    rotation_lf = hl2ss_3dcv.rm_vlc_get_rotation(port_left)
    rotation_rf = hl2ss_3dcv.rm_vlc_get_rotation(port_right)

    K1, Rt1 = hl2ss_3dcv.rm_vlc_rotate_calibration(calibration_lf.intrinsics, calibration_lf.extrinsics, rotation_lf)
    K2, Rt2 = hl2ss_3dcv.rm_vlc_rotate_calibration(calibration_rf.intrinsics, calibration_rf.extrinsics, rotation_rf)

    # Get projection matrices (X_3DH.T @ Rt1 @ K1) -----------------------------
    P1 = hl2ss_3dcv.rignode_to_camera(Rt1) @ hl2ss_3dcv.camera_to_image(K1)
    P2 = hl2ss_3dcv.rignode_to_camera(Rt2) @ hl2ss_3dcv.camera_to_image(K2)

    # Using undistorted, rotated, unrectified images --------------------------
    # lf_u = cv2.remap(data_left.payload,  calibration_lf.undistort_map[:, :, 0], calibration_lf.undistort_map[:, :, 1], cv2.INTER_LINEAR)
    # rf_u = cv2.remap(data_right.payload, calibration_rf.undistort_map[:, :, 0], calibration_rf.undistort_map[:, :, 1], cv2.INTER_LINEAR)
    # lf_ru = hl2ss_3dcv.rm_vlc_rotate_image(lf_u, rotation_lf)
    # rf_ru = hl2ss_3dcv.rm_vlc_rotate_image(rf_u, rotation_rf)
    # image_l = hl2ss_3dcv.rm_vlc_to_rgb(lf_ru)
    # image_r = hl2ss_3dcv.rm_vlc_to_rgb(rf_ru)
    # image = np.hstack((image_l, image_r))

    # Get two correspondences from checkerboard pattern -----------------------
    p1 = np.array([[523/2,        575/2       ], [249/2, 249/2]]) # Image L points [[x1, x2],[y1, y2]]
    p2 = np.array([[1321/2 - 480, 1373/2 - 480], [245/2, 243/2]]) # Image R points [[x1, x2],[y1, y2]]

    # Triangulate -------------------------------------------------------------
    points = cv2.triangulatePoints(P1[:,:3].T, P2[:,:3].T, p1, p2)

    x1 = points[:3, 0] / points[3, 0]
    x2 = points[:3, 1] / points[3, 1]

    print(np.linalg.norm(x1 - x2)) # Measure distance between two 3D points in checkerboard pattern (known distance)

Thanks, that did the trick!

jdibenes / hl2ss

VLC stereo depth estimation #120