marek-simonik / record3d

Accompanying library for the Record3D iOS app (https://record3d.app/). Allows you to receive RGBD stream from iOS devices with TrueDepth camera(s).
https://record3d.app/
GNU Lesser General Public License v2.1
379 stars 55 forks source link

How to get image coordinate using K and camera pose? #55

Closed RaiAnant closed 1 year ago

RaiAnant commented 1 year ago

Hi,

I have been using record3d for a while for my project. I wanted to get pixel projection of a xyz coordinate in camera frame. To elaborate more, given a video, I wanted to project the camera position at time T, on the frame at time T-k.

I am guessing we will have to use K for this provided in the metadata of r3d export. The most intuitive thing I could think of was to build camera pose affine matrix A using the camera poses provided in the export. The to get position of the camera at time T, wrt frame T-k, I calculate D = inv( A(T-k) ) * A(T). And then from D, I get xyz in camera frame (xyz_cam). And finally to project it on the image, I do xy_img = K * xyz_cam. But the values I am getting after doing this don't look correct.

So wanted to know what is the way to go about this. I would really appreciate it if you could help me resolve this issues, thanks!

marek-simonik commented 1 year ago

Hi, I replied to your email. Feel free to send me an email or reopen this issue if needed.

RaiAnant commented 1 year ago

Hi

Thanks a lot for your reply. With your suggestion, I was able to make my code work. I will post the code below for future reference in case people stumble on the same problem.

from scipy.spatial.transform import Rotation as R
from quaternion import quaternion, as_rotation_matrix

P = np.asarray([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])

def apply_permutation_transform(matrix):
    return P@matrix@P.T

def get_mat_from_pose(pose):
    matrix = np.eye(4)
    if len(pose) == 6:
        px, py, pz, r, p, y = pose
        matrix[:3, :3] = R.from_euler('xyz', [r, p, y], degrees=False).as_matrix()
    else:
        qx, qy, qz, qw, px, py, pz = pose
        matrix[:3, :3] = as_rotation_matrix(quaternion(qw, qx, qy, qz))
    matrix[:3, -1] = [px, py, pz]
    return matrix

with open('./trajs/'+dataset+'/metadata', "rb") as f:
        metadat = json.loads(f.read())

#camera intrinsics
K = np.asarray(metadat['K']).reshape((3,3)).T

#frame number.
#here I want to project position of camera at frame n back to the image of frame i
i = 40 
n = 60
mati = get_mat_from_pose(metadat['poses'][i])
matn = get_mat_from_pose(metadat['poses'][traj_frames[n]])

matn_trf = apply_permutation_transform(matn)

# 3d point in the camera frame of reference.
#since I just want to project camera point,  xyz_wrt_cam = 0, 0, 0
xyz_wrt_cam = np.array([[0.0, 0.0, 0.0, 1]])
obj_pts = (matn_trf @ xyz_wrt_cam.T)[:3] 

cam_to_world = apply_permutation_transform(mati)
world_to_cam = np.linalg.inv(cam_to_world)

rvec = world_to_cam[:3, :3]
tvec = world_to_cam[:3, 3]

xyz_img = cv2.projectPoints(obj_pts, rvec, tvec, K, np.array([]))[0][0][0]