Do 'poses' in 'metadata' refer to world-to-camera transformation (extrinsics) or camera-to-world transformation?

Hello,

first of all, thank you for your excellent work with record3d. It really has made extracting RGBD video from the iPad a fluid experience. The question I would like to ask to day is the following:

I'm asking this question because different github issues related to the ARKit poses provide conflicting information. For instance, in issue #31 , t is referred to as the "world pose", which I assumes refers to the coordinates of the world origin in the camera frame. This suggests that [R | t] refers to the world-to-camera transformation (extrinsics).

However, in the same issue, you reply that X_{world} = [R|t] X_{cam}, suggesting that [R | t] actually refers to the camera-to-world transformation.

I would really appreciate it if you could resolve this confusion. Thank you.

marek-simonik / record3d

Do 'poses' in 'metadata' refer to world-to-camera transformation (extrinsics) or camera-to-world transformation? #36