Closed xwjabc closed 2 years ago
Hi, I'm converting the poses so that the directions are X = right / left Y = up / down Z = forward / backward
This is the typical format used by most datasets I've encountered like ETH3D and TUM-RGBD.
The main reason for this conversion is that this is the format that I'm most comfortable working with. Also leads to simpler equations for projecting points between images.
Got it. Thank you for your quick reply!
For the pose parsing, I have a follow-up question: In this line, the pose matrix is inverted. I wonder if the reason is, the original pose is defined as camera-to-world transformation and here we convert it into world-to-camera transformation?
Based on this interpretation, I draw a figure to show my understanding of Eq.(3) (I have already converted the axes from NED to your setting, which seems to be called as "CAM" in tartanair_tools):
I wonder if my understanding is correct? Thanks!
Yes that all looks correct to me. The datasets represent poses as camera-to-world transformations. Droid-slam estimates world-to-camera poses which get converted back to camera-to-world for evaluation.
Gotcha. Thank you for your clarification!
Hi, thank you for your great work! Recently when I read the data loading part of Droid-SLAM, I found the pose is parsed as:
From the TartanAir tools, the line of the pose data file has the format
tx ty tz qx qy qz qw
, which uses a NED frame. In your implementation it seems you convert XYZ to YZX coordinates. I wonder if there is any reason behind it? Thanks!