first of all, thank you for your excellent work with record3d. It really has made extracting RGBD video from the iPad a fluid experience.
The question I would like to ask to day is the following:
Do 'poses' in 'metadata' refer to world-to-camera transformation (extrinsics) or camera-to-world transformation?
I'm asking this question because different github issues related to the ARKit poses provide conflicting information. For instance, in issue #31 , t is referred to as the "world pose", which I assumes refers to the coordinates of the world origin in the camera frame. This suggests that [R | t] refers to the world-to-camera transformation (extrinsics).
However, in the same issue, you reply that X_{world} = [R|t] X_{cam}, suggesting that [R | t] actually refers to the camera-to-world transformation.
I would really appreciate it if you could resolve this confusion.
Thank you.
Hello,
first of all, thank you for your excellent work with record3d. It really has made extracting RGBD video from the iPad a fluid experience. The question I would like to ask to day is the following:
Do 'poses' in 'metadata' refer to world-to-camera transformation (extrinsics) or camera-to-world transformation?
I'm asking this question because different github issues related to the ARKit poses provide conflicting information. For instance, in issue #31 ,
t
is referred to as the "world pose", which I assumes refers to the coordinates of the world origin in the camera frame. This suggests that [R | t] refers to the world-to-camera transformation (extrinsics).However, in the same issue, you reply that
X_{world} = [R|t] X_{cam}
, suggesting that [R | t] actually refers to the camera-to-world transformation.I would really appreciate it if you could resolve this confusion. Thank you.