princeton-vl / DROID-SLAM

BSD 3-Clause "New" or "Revised" License
1.75k stars 295 forks source link

Map Reuse and Localization #108

Open Sanyam-Mehta opened 1 year ago

Sanyam-Mehta commented 1 year ago

Let's say I have built a map using a dataset and saved the results on the disk. During the next step, I want to utilize the saved results to localize the position of the camera using a new frame. What could I do using the Droid Slam approach?

My understanding of the code tells me that during the map-building process, the pose of the next frame is kept equal to the estimated pose of the current frame and then it's optimized as follows:

  1. Extract edges from the factor graph that are close to the current frame using the estimated pose of the current frame (using add proximity factors and frame distance kernel)
  2. Estimate the motion flow between the sets of all frames connected to the current the current frame
  3. Extract correlations features that are in the neighbourhood of each pixel using the estimated motion flow
  4. Update the motion flow information using the motion flow delta obtained after processing the correlation features
  5. Perform local bundle adjustment using the updated motion flow to get an updated pose and depth estimate
  6. Repeat until it converges (or for a set number of iterations)

Now the issue with reuse is that the first step that extracts the frames in the neighbourhood of the current frame uses the pose estimate of the last frame. For pure relocalization using a saved map, it is impossible to get this initial pose estimate. There is no feature-based matching step either that we could use to extract the frames close to the current frame. In such a scenario, what should be done to localize a frame using the saved map?

This is also related to the question of loop closure: How is it performed in the current setting?

kwea123 commented 1 year ago

In relocalization people often do image based search first to get images that are close to the query image. So in droid slam you need to do

  1. save the image/pose pairs of your mapping result in a database
  2. design a image matching algorithm (there should be many in the literature) that gives you the images in the db that are close to your query image (some algo even estimates the relative pose quite accurately)
  3. use the resulting images' poses (or even multiply by the estimated relative pose) to give the initial pose estimate of your query frame.
  4. then you can do local ba

loop closure is done by performing local ba if it finds previous frames inside the radius

sumitsarkar1 commented 1 year ago

@Sanyam-Mehta you need image based localization...or appearance based localization..check RTabMap

nnop commented 9 months ago

Are you referring to the timestamp radius when you mention "radius"? In that case, the inside radius would be related to odometry rather than loop closure, correct?

@kwea123