paul-michalik / ORB_SLAM2

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Other
11 stars 3 forks source link

Investigate the loss of tracking in some of the kitti datasets #80

Open paul-michalik opened 6 years ago

paul-michalik commented 6 years ago

We have observed loss of tracking in specific Kitti datasets. This needs to be investigated an fixed.

ghost commented 6 years ago

q. why tracking lost occurs? Ans. if unrelated neighbor images are encountered. Neighbor images can be unrelated if the matching of features(key-points) are less among neighbor images. Description ->

paul-michalik commented 6 years ago
  1. Please enumerate the Kitti datasets for which the loss of tracking is reproducible.
  2. Please fix the typos and the formatting of the text
  3. Please provide a link to the code where the procedure you describe is implemented:

    original orb-slam2 provider provide following method...

From what I understand, tracking gets lost when no association between consecutive frames can be found. The main reason for this appears to be the uncontrolled amount of time required for bundle adjustment in the mapping thread. While the mapping thread is busy, no further frames are tracked. When mapping thread becomes available, the current frame might be too different from the last tracked frame

Solution:

  1. Provide a buffer for frames between the tracking and mapping threads which assures that frames are not too far away from each other. If the hardware is powerful enough to execute the BA in real time the buffer will always contain one or few images only. In order to prevent overflows, the buffer should have an expiry strategy or it should start to persist the frames when the size of the buffer reaches limits. This sounds like a non trivial problem but I guess we don't have to deal with it right now. Just providing a large enough buffer should prevent the loss of tracking caused by incoherence of images!

  2. Make sure that the data source throttles the invocation of the tracking function to some maximum feasible frequency. At the end, for data collected ahead of time the frequency has to be limited otherwise only the speed of the IO subsystem is the limit :-)

@PTavse, @shanmukhananda opinions?

ghost commented 6 years ago

@paul-michalik

Please enumerate the Kitti datasets for which the loss of tracking is reproducible.

http://www.cvlibs.net/download.php?file=data_odometry_gray.zip data_odometry_gray\dataset\sequences\01\image_0

how original orb-slam2 handle tracking lost:

  1. each frame(images) has its corresponding timestamps.
  2. this timestamps tells when frame is created and its time difference from previous frame.
  3. each frame should be provided to tracking thread based on its corresponding timestamps is elapsed.
    • calculate tracking time of current frames (ttrack)
    • find the time difference(T) between its next frame's timestamp with current frame's timestamp
    • if ttrack is less than T then sleep tracking thread for ((T-ttrack)*1e6)

Please provide a link to the code where the procedure you describe is implemented:

see line 98 - 105 https://github.com/raulmur/ORB_SLAM2/blob/master/Examples/Monocular/mono_tum.cc

shanmukhananda commented 6 years ago

@paul-michalik

I think Key-frame selection should be purely based on geometrical relationship, visual features change between the frames. (should already be handled in orb-slam) Selection of keyframes based on mapping thread state is not nice. Using a buffer approach is OK, we can get a consistent behavior between the consecutive runs. If tracking thread needs the key-frame poses after local bundle adjustment, to estimate upcoming frame pose, then there might be another buffer required, from mapping to tracking, but I doubt this. With buffer approach, actual car pose in real world and car pose in map (viewer) might not match, we need to test and see.

Also it could help if we bring in GPU, opencv definitely supports GPU, but BA on GPU? which means running g2o solvers on GPU - not sure if it even make sense.

paul-michalik commented 6 years ago

I think Key-frame selection should be purely based on geometrical relationship, visual features change between the frames. (should already be handled in orb-slam)

This assumes that we don't loose too much information when we consider geometric criteria only. However we could attempt to follow another strategy - at least for "dense" semantic features this could make sense: Let ORB-SLAM2 pick the frames based on their criteria and detect semantic features in the selected frame afterwards. This would require a "call-back" to a feature detector from inside the pipeline and it would not work for non-visual readings such as GPS or inertial sensor data.