Clarification on Camera Convention Used in Dataset Pickle Parameters

Hello,

Thanks for the great work!

I am currently working with your dataset that includes parameters such as cam_R, cam_T, fx_ndc, and fy_ndc. It appears that these parameters are defined in the context of the PyTorch3D camera convention, where "+X points left, +Y points up, and +Z points out from the image plane" (as described here: https://pytorch3d.org/docs/cameras), since I found these parameters are directly fed into pytorch3d.PerspectiveCameras.

However, I am interested in converting these camera parameters to the OpenCV camera convention, which is commonly used in many NeRF (Neural Radiance Fields) projects. In the OpenCV convention, the coordinates are defined as "+X points right, +Y points down, and +Z points out."

I noticed in the dataset, particularly in the file 'data/hand_only/hand1/PARAM_1064/65_21320028.pickle', that there are two sets of parameters: 'cam_R' and 'R', and 'cam_T' and 'T'. It seems that a conversion is applied since R = cam_R.transpose(), and T=cam_T. However, I'm not sure if this conversion aligns with the typical OpenCV convention.

In my research, I found a code snippet in the PyTorch3D documentation (https://pytorch3d.readthedocs.io/en/latest/_modules/pytorch3d/utils/camera_conversions.html) that shows how to convert camera parameters to the OpenCV convention. Here's a snippet of the code for reference:

R_pytorch3d = cameras.R.clone() # (batchsize, 3, 3)
T_pytorch3d = cameras.T.clone() # (batchsize, 3)
T_pytorch3d[:, :2] *= -1
R_pytorch3d[:, :, :2] *= -1
tvec = T_pytorch3d
R = R_pytorch3d.permute(0, 2, 1)

Could you kindly explain the exact camera convention used for 'cam_R' and 'cam_T' in your dataset, and whether 'R' and 'T' are already converted to the OpenCV convention correctly as mentioned above? Clarification on this matter would greatly assist me in my research.

Thank you for your time and assistance!

Hello, sorry for a late reply. Given the corresponding image and parameter file, you can follow this code to verify the internal and external parameters of the camera in OpenCV mode and project the 3D key points to the 2D image. Among them, 'R', 'T' and 'K' are the parameters of the camera.

img_file = '***.jpeg'
param_file = '***.pickle'

with open(param_file,'rb') as f:
        param = pickle.load(f)
R = param['cam_R'].T
T = param['cam_T']
fx_ndc = param['fx_ndc']
fy_ndc = param['fy_ndc']
px_ndc = param['px_ndc']
py_ndc = param['py_ndc']
H = param['H']
W = param['W']
joint3d = param['joint3d_21']

s = min(H, W)
K = np.eye(3)
fx = -1.0 * fx_ndc * (s-1) / 2.0 
fy = -1.0 * fy_ndc * (s-1) / 2.0
cx = -1.0 * px_ndc * (s-1) / 2.0 + (W-1) / 2.0
cy = -1.0 * py_ndc * (s-1) / 2.0 + (H-1) / 2.0
K[0, 0], K[1, 1] = fx, fy
K[0,2], K[1,2] = cx, cy
view_trans = np.zeros((3, 4))
view_trans[:3,:3] = R
view_trans[:3,3] = T

proj = K @ view_trans

pro_2d = joint3d @ proj[:3,:3].T + proj[:3,3][None,...]
pro_2d[:,:2] /= pro_2d[:,2:]

img = cv2.imread(img_file)
point_size = 1
point_color = (0, 0, 255)
thickness = 4
for i in range(21):
        cv2.circle(img, (int(pro_2d[i,0]), int(pro_2d[i,1])),point_size, point_color, thickness)
cv2.imwrite('./test.png', img)

iscas3dv / HO-NeRF

Clarification on Camera Convention Used in Dataset Pickle Parameters #1