princeton-vl / DROID-SLAM

BSD 3-Clause "New" or "Revised" License
1.75k stars 295 forks source link

[Question] How keyframes are selected? #47

Closed javierttgg closed 2 years ago

javierttgg commented 2 years ago

Hi @zachteed, Amazing work (and track of works) :), thanks a lot for open-sourcing it!

I'm curious about some aspects regarding the keyframes:

1) I see that at Sec. 3.4 of the paper, the frontend is in charge of selecting them. However, I believe that how this selection is done is not present within the Frontend explanation.

2) On the other hand, at the end of Section 3.4, it is mentioned that non-keyframe frames perform motion-only bundle adjustment, whereas the front-end performs local bundle-adjustment. Does this mean that unless the incoming new frame is selected as a keyframe, it won't be optimized in the frontend / backend?

I'd highly appreciate if you could shed some light on this,

GCChen97 commented 2 years ago

I found two implementations for keyframe selection in the code: 1.delta.norm(dim=-1).mean().item() > self.thresh in MotionFilter.track, which I guess is optical-flow-based; 2.self.graph.rm_keyframe(self.ti - 2) in DroidFrontend.__update, which I guess is L2-distance-based I am not sure if there are more strategies.

javierttgg commented 2 years ago

Thanks a lot @GCChen97 :)

That answers point 1, and I believe it is also highly related to point 2.

If my understanding of the code is right then only the frames appended to Droid.video are processed. The frames that are appended seem to be decided under MotionFilter.track. Thereby if the flow threshold that you commented above is not satisfied then the incoming frame won't be processed in the frontend.

So to sum up, thanks to your hints I believe the answers to my previous q's are:

  1. Condition delta.norm(dim=-1).mean().item() > self.thresh to determine if a frame is set as keyframe. Condition self.graph.rm_keyframe(self.ti - 2) to update the frontend keyframes.
  2. Yes, a frame is discarded unless it is detected as a keyframe in the frontend. Then the pose computation of the discarded frames must be done at some other part of the system.

I leave the issue open in case there is something wrong or any more hints are given.

GCChen97 commented 2 years ago

Happy to discuss this work with you :) Non-keyframe pose interpolation is not online but runs after global bundle adjustment. You can refer to trajectory_filler.py.

javierttgg commented 2 years ago

Thanks @GCChen97 , that makes a lot of sense. Thanks to your hints I'm closing the issue :) Best