how to obtain camera data frame-by-frame

yuhaoliu7456 commented 1 month ago

Hi, could you help me obtain the camera data in the format below? Thanks a lot. """ { "focal_length": 689.7886962890625, "image_size": [ 720, 960 ], "orientation": [ [ 0.6210967302322388, 0.06393816322088242, -0.7811214327812195 ], [ 0.06391450762748718, -0.9974791407585144, -0.030827326700091362 ], [ -0.7811233997344971, -0.030778242275118828, -0.6236176490783691 ] ], "pixel_aspect_ratio": 1.0, "position": [ 0.08624304831027985, -0.12180781364440918, -0.09078224003314972 ], "principal_point": [ 362.6399841308594, 481.7470397949219 ], "radial_distortion": [ 0.0, 0.0, 0.0 ], "skew": 0.0, "tangential_distortion": [ 0.0, 0.0 ] } """

marek-simonik commented 1 month ago

Hi,

I don't know the exact specification of the format you showed, so I won't be able to provide enough details, but let me give you a partial answer.

The radial distortion, skew, and tangential distortion parameters can be assumed to be 0.

I'm not sure if image_size is supposed to contain the resolution of the RGB images or the resolution of the depth images; you'll have to decide what is best for your use case.

The orientation key in your example is an array of float3 values, meanwhile the position key is just a single float3. I would expect the position and the orientation keys to be arrays of float3 values, containing the camera pose data for each frame. It that's the case, then this is how you get the camera transform T_wc for a frame i in the form of a quaternion describing the camera rotation and a vector describing camera translation:

quat_x = metadata["poses"][i][0]
quat_y = metadata["poses"][i][1]
quat_z = metadata["poses"][i][2]
quat_w = metadata["poses"][i][3]

pos_x = metadata["poses"][i][4]
pos_y = metadata["poses"][i][5]
pos_z = metadata["poses"][i][6]

It looks like the orientation key in your example stores camera rotation in Euler angles, so you will need to convert the quaternion into Euler angles.

Here you can see how to obtain the rest of the values from Record3D's metadata file:

focal_length_x = metadata["K"][0]
focal_length_y = metadata["K"][1]
# On iOS, it is almost always true that `focal_length_x` is the same as `focal_length_y`.
# In such case, you can set e.g. `focal_length = focal_length_x` and `pixel_aspect_ratio = 1.0`

principal_point = [ metadata["K"][6], metadata["K"][7] ]

rgb_img_width = metadata["w"]
rgb_img_height = metadata["h"]
depth_img_width = metadata["dw"]
depth_img_height = metadata["dh"]

yuhaoliu7456 commented 1 month ago

Thank you for getting back to me so quickly. I just realized that your app can not directly obtain the data. Instead, it should be calculated from other files. BTW, here is the link I referred to: https://github.com/KAIR-BAIR/dycheck/blob/main/docs/RECORD3D_CAPTURE.md. They also utilized your app to generate data for use.

Thanks.

marek-simonik / record3d

how to obtain camera data frame-by-frame #88