microsoft / psi

Platform for Situated Intelligence
https://github.com/microsoft/psi/wiki
Other
538 stars 96 forks source link

Change the coordinate system to opencv #227

Closed HaozheQi closed 2 years ago

HaozheQi commented 2 years ago

Hi, I was trying to use Opencv cv2.projectPoints() to project the 3D hand and eye poses extracted by HoloLensCapture back into the image space, but it seems that the camera coordinate systems between the extracted 3D hand joints and the opencv are different. When I extract hand joints in './Export/Hands/Left.txt' using

joints = np.float64(one_line_data[4:]).reshape((26,4,4))[:,:3,3]

and extract camera pose in "./Export/Video/Pose.txt" using

cam_pose = np.float64(one_line_data[1:]).reshape((4,4))

I cannot generate a correct projection result using

xy, _ = cv2.projectPoints(joints, cv2.Rodrigues(cam_pose[:3, :3]), cam_pose[:3, 3], intrinsics, distortion)

Do you have any idea about how to transform the extracted 3D hand joints to the opencv coordinate system?

sandrist commented 2 years ago

In \psi, the basis assumption used throughout (as inherited from MathNET) is that Forward=X, Left=Y, and Up=Z:

        Z
        |   X
        |  /
        | /
 Y <----+

In OpenCV, I believe the assumption is that Right=X, Down=Y, and Forward=Z. So you'll have to do a change of basis transformation to convert \psi coordinate systems into OpenCV coordinate systems if you want to call OpenCV functions on them.

For your specific problem, you might also want to take a look at ProjectHandsToCamerasTask.cs which is useful for projecting hand joints to camera views and visualizing in PsiStudio. It leverages a GetPixelPosition method to convert 3D points into 2D pixel locations.

HaozheQi commented 2 years ago

Hi, thank you very much for your response. But unfortunately, I still don't know how to transfer to the opencv camera coordinate. Given the basis assumption, I re-write the code like this:

joints = np.float64(one_line_data[4:]).reshape((26,4,4))[:,:3,3]  # from './Export/Hands/Left.txt' 
cam_pose = np.float64(one_line_data[1:]).reshape((4,4))  # from './Export/Video/Pose.txt'
joints_cam_coord = (cam_pose [:3, :3].dot(joints .T) + cam_pose [:3, 3:4]).T  # change the joints from world coordinate to RGB camera coordinate
joints_opencv= np.array([-joints_cam_coord [:,1], -joints_cam_coord [:,2], joints_cam_coord [:,0]]).T  # change the coordinate from MathNET basis to opencv basis
xy, _ = cv2.projectPoints(joints_opencv, np.zeros(3), np.zeros(3), intrinsics, distortion)   # project to 2D

However, I cannot obtain a correct reprojection through this code. Is there any mistake that I made in this code for the basis change? Does the cam_pose mean the transformation from RGB camera to the world or from world to RGB?

taeinkwon commented 2 years ago

Hi, In my case, I used the np.linalg.inv(cam_pose) instead of cam_pose. It looks like cam_pose is cam-to-world. Here is the part of my code to project hand joints on the images.

img_pose = img_pose_array[frame][1:].reshape(4, 4)

hand_point_right = np.dot(axis_transform, np.dot(np.linalg.inv(
    img_pose), np.concatenate((hand_point_right, [[1]*np.shape(hand_point_right)[1]]))))
hand_point_left = np.dot(axis_transform, np.dot(np.linalg.inv(
    img_pose), np.concatenate((hand_point_left, [[1]*np.shape(hand_point_right)[1]]))))

# Put an empty camera pose for image.
rvec = np.array([[0.0, 0.0, 0.0]])
tvec = np.array([0.0, 0.0, 0.0])

img_points_left, _ = cv2.projectPoints(
    hand_point_left[:3], rvec, tvec, img_intrinsics, np.array([]))
img_points_right, _ = cv2.projectPoints(
    hand_point_right[:3], rvec, tvec, img_intrinsics, np.array([]))
HaozheQi commented 2 years ago

Hi, Taein, Thanks for your help, your suggestion and code are very helpful. The only thing is could you please provide the value of the axis_transform parameter? I made a guess about it by rotating 90 degrees for Y axis and -90 degrees for X axis:

trans_X = np.array([[1,0,0,0],[0,0,1,0],[0,-1,0,0],[0,0,0,1]])
trans_Y = np.array([[0,0,1,0],[0,1,0,0],[-1,0,0,0],[0,0,0,1]])
axis_transform= np.dot(trans_Y, trans_X)

But it seems the projection still has a slight error. image Is the axis_transform that I create correct?

taeinkwon commented 2 years ago

Hi,

It looks like the hand points are x inverted. Could you invert the x-axis? Also, the axis_transform I use is like the following: axis_transform = np.linalg.inv(np.array([[0, 0, 1, 0], [-1, 0, 0, 0], [0, -1, 0, 0], [0, 0, 0, 1]]))