raulmur / ORB_SLAM2

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Other
9.32k stars 4.69k forks source link

Problem with initialization and big rotation movement in monocular hand-held case #194

Open shimiaoli opened 7 years ago

shimiaoli commented 7 years ago

Hi

Thank you @raulmur very much for sharing this wonderful code.

I have encountered two problems when I test ORB_SLAM2 on monocular hand-held camera video data.

  1. Initialization seems to be random every time I run it even on the same video sequence. Sometimes the program happens to get good initialization and it continues with correct trajectory. Sometimes it is not so lucky and following estimated trajectory can be wrong. Is this randomness due to RANSAC process during initialization? Besides, when the scene is planar, it seems to be difficult to initialize.

  2. When tested on video sequence with big camera rotation (rotate to face new area), SLAM tracking tends to get lost and no new points are created into the map. Relocalization can not be successful in such case since the camera rotates to face new area.

Any suggestion on how to fix these two problems ?

Thank you.

Shimiao

AlejandroSilvestri commented 7 years ago

Hi, both problems are inherent to visual monocular slam. The solution is not in the algorithm, but in the video, remaking it with some considerations in mind.

Tracking works only on mapped points. New map points are created via triangulation: so, you need frames taken from different positions, rotation only doesn't achieve that. Pure rotations in uncharted area are "very bad" for visual monocular slam, because they not provide enough info to triangulate new map points, and the scene change very fast, faster than translations.

More or less same thing for initialization: you need to move, not only rotate. Initialization is random when you have few opportunities to initialize. When you record de video, aim under the horizon so you see the floor near you, and move laterally. The different points velocity help to triangulate (this is an over simplificated argument).

Orb-slam has two initialization algorithm, one for volumetric features and another for coplanar features. They are the best algorithms available, sort of. Coplanar initialization tends to be more difficult in my experience.

When tracking, don't suddenly rotate to new areas, rotate slowly and move simultaneously. orb-slam2 need some time to make new keyframes and add new points to the map. It depends heavyly on your processing power. With low power slowing down the video helps. In difficult places (tracking loss) you can go forwards and backwards while playing your video and see orb.slam adding more keyframes and therefore more map points in the same area, that help tracking to the next new area.

I forgot the more important issue: calibrate your camera, and check it visually: Take a picture to a building with windows or whatever has 3D perpendicular straight lines, in perspective, so you can check if your distortion coeficients gives you an undistorted image with straight lines.

shimiaoli commented 7 years ago

Thank @AlejandroSilvestri for the insightful answer.

I still have doubts that during pure camera rotation (such as panning into new area), we should be able to track feature points by homography and recover camera motion by decomposing the estimated homography. This means even no new points can be triangulated and created into the map due to camera having no translation, we should still be able to continue tracking camera motion? However, here in ORB-SLAM it seems the system loses track of both scene points and camera motion in pure camera panning case.

AlejandroSilvestri commented 7 years ago

That's a great idea.

ORB-SLAM2 has a visual odometry mode estimating motion while lost, but as far as I can remember it only works with translation, not with pure rotation.

When tracking is lost, you can assume you are rotating, but there is no way to tell that. It seems to me like a premise: when lost while rotating, assume rotation only motion.

But in order to the system be able to add new points, it have to relocalizate itself first, the rotation have to continue untill enough known map points are in the picture. So, in both cases you have to wait for relocalization.

shimiaoli commented 7 years ago

Thank @AlejandroSilvestri .

Besides, in comparison, I have run EKF-based MonoSLAM (with inverse depth initialization) on the same video sequence. Under MonoSLAM, it seems camera trajectory can be continually tracked even during pure camera panning, though no newly initialized points' position is converged. (MonoSLAM with inverse depth initializes points immediately once seeing it, but with large depth uncertainty) However, although it is not obviously observed, I guess scale may drift after the panning.

Of course, generally ORB-SLAM outperforms MonoSLAM in accuracy and speed. But it seems MonoSLAM with inverse depth initialization can handle tracking in pure rotation case better.

ank700 commented 7 years ago

Hello @AlejandroSilvestri , In one of the comments you say that "pure rotations in uncharted area are "very bad" for visual monocular slam, because they not provide enough info to triangulate new map points, and the scene change very fast, faster than translations". Can you please tell what do you exactly mean by 'very bad'.

If I use a wide field of view lens, say 100 degrees or more, then the scene won't change as fast as with a small fov lens. This might provide with more information for the triangulation.

Thanks

AlejandroSilvestri commented 7 years ago

Hi @ank700 , For triangulate a point you need to observe it from two different places, say two different positions.

In pure rotations your position doesn't change, so you can't triangulate new points until you displace the camera. Until this happens, you can lose tracking: and this happens when you stop seeing known points.

So, wide lens covers bigger angles, you can rotate more without stop seeing known points.

BTW, for fisheye lens I change the distortion model (modifying the code) and it worked excellent.

AlejandroSilvestri commented 7 years ago

@shimiaoli , your are right, I must correct myself: pure rotation are specially bad for feature based non filter based visual monocular SLAM.

Filter based VMSLAM (like MonoSLAM) can bear pure rotations better than non-filter because, as you pointed out, points are added immediately to the map.

However pure rotations still don't give enough info to properly triangulate points, so the system won't have enough info to compute pose correctly based only on new points observed on pure rotations.

Happy new year!

ank700 commented 7 years ago

@AlejandroSilvestri , Do you mean that by changing the distortion model for fisheye lens you are able to perform pure rotations. Can you share the changes that you have made?

If I use an omnidirectional camera which captures 360 degree panaroma shots, the problem of pure rotation can be solved or not?

AlejandroSilvestri commented 7 years ago

@ank700, With omnidirectional camera these pure rotation problems will be gone. But I believe ORB-SLAM2 needs a big modification to use those cameras.

ORB-SLAM2 uses pinhole camera with planar projection. Projections take advantages of homogeneous coordinates, which aren't compatible with omnidirectional cameras. Adaptation is needed.

gabrielmirandat commented 6 years ago

Hello!

I am working with a Pioneer 3-AT mobile robot, which has non-holonomic moviment (car like), but can rotate around vertical axis. The camera is fixed in front of it, so even I rotate around robot's axis, the camera has translation. In my tests I am driving away and when I need to turn, I rotate while translating, and even this way I always loses track. I am translating with around 0.2 m/s and rotating with 0.1 m/s. What can I do to overcome this? Thanks in advance!

AlejandroSilvestri commented 6 years ago

@gabrielmirandat

Increase turning radius. Remember, turning is a problem only while mapping (a new area). No problem when turning in an already mapped area.