yzcjtr / GeoNet

Code for GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)
MIT License
723 stars 181 forks source link

Creating ORB-SLAM (full) snippets #63

Open hemangchawla opened 4 years ago

hemangchawla commented 4 years ago

Hello,

How do you make the 5-frame snippets from the output of ORB-SLAM full? ORB-SLAM outputs only the Keyframe poses (monocular case) and therefore the creation of snippets is unclear. Also in this case of ATE-5 computation, do you use 5 Keyframes of ORB SLAM or Keyframes corresponding to 5 frames of the Ground Truth trajectory?

yzcjtr commented 4 years ago

Any modern SLAM including ORB-SLAM should be able to predict the camera motion for every single frame. I'm afraid your assumption doesn't hold. The results of ORB-SLAM are referenced from https://github.com/tinghuiz/SfMLearner for fair comparison purpose.

hemangchawla commented 4 years ago

Thanks! However only ORB-SLAM (stereo) can be used to recover the complete trajectory consisting of all frames and not just key frames. it would be incorrect to use all the frames in the monocular setting. See here: https://github.com/raulmur/ORB_SLAM2/issues/60#issuecomment-207330963

Frames are never optimized by bundle adjustment or corrected when closing a loop. The frame pose is only correct at the moment it was computed, you should never use it afterwards. You could store the relative pose to a reference keyframe, and compute the frame pose from the keyframe pose. This is done for the stereo setting to recover the full trajectory (see SaveTrajectoryKITTI function is System). However in the monocular setting this would be incorrect, the relative transformation is a rigid body transformation and will not take into account scale corrections.

yzcjtr commented 4 years ago

I see no contradiction here. Assuming you only get the keyframe poses and the relative poses to some keyframes, you can still recover the full camera trajectory. The "incorrectness" Raul mentioned is talking about the scale ambiguity which cannot be resolved by monocular SLAM. But the rotation and normalized translation are still correct. Moreover, we optimize the trajectory scale to make it the same as the ground truth poses. This can be found in the pose evaluation code.

Besides, we never run the ORB-SLAM by ourselves. As noted in the paper, the numbers of ORB-SLAM are referenced from Tinghui's paper.

hemangchawla commented 4 years ago

Indeed monocular SLAM is up to scale. However, with only that understanding the Key Frames and the relative frames are all upto scale and therefore must be equally correct or "incorrect". However that is not the case since the SIM3 optimization (R,t,s) is performed for Keyframes during loop closure, but not for non-keyframes. Therefore the keyframes and relative frames are upto different scales, as I understand.

--

Moreover, we optimize the trajectory scale to make it the same as the ground truth poses. This can be found in the pose evaluation code.

I see that. Thank you! I wonder though why the umeyama's algorithm is not used for the scaling and alignment? I notice this in SfMLearner repo as well.