Heavily constraining camera roll & pitch and optimising around it

SamuelDudley commented 7 years ago

Hi, Firstly awesome work with this algorithm, its implementation and making this code base open source! Also, thanks to all the people who follow this project and donate their time and expertise to help answer questions like this one :)

Background:

As a side project to my day job (in my free time) and in support of the Autopilot project, I'm attempting to use ORB-SLAM2 as a position estimate on a fixed wing UAV for GPS free navigation. I have a dataset taken from a plane at ~5Hz with associated autopilot NED + attitude estimates (from the autopilot 22 state EKF which included GPS fusion during the data acquisition flight). This autopilot data is as close to a drift free ground truth that I can obtain for this problem. I have evaluated ORB-SLAM2 on this dataset with no modifications and while it does quite a good job tracking translation over ground and heading (a function of the nadir camera) over the ~100km of flat terrain, over time the world starts to 'curl up' and over as the pitch estimate diverges. Even without GPS the autopilot does a very good job of knowing the roll, pitch and to a lesser extent yaw of the aircraft. If the ORB-SLAM solution was constrained in roll and pitch it would do significantly better at matching the autopilot NED estimate and would be a very useful navigation input...

screenshot from 2017-01-12 00 51 34

screenshot from 2017-01-12 00 50 52

Problem:

I want to aid the ORB-SLAM algorithm by supplying 'true' camera attitude estimates with every camera frame. Specifically I want to constrain the attitude of the camera when the algorithm runs the graph optimisations. Ideally I would like to constrain the roll & pitch of the camera estimate with my supplied values and have the bundle adjustment work only to optimise translation, map points and to a lesser extent yaw. As I know very little about G2O and graph optimisation in general I simply don't even know if this is possible... Not knowing any better and assuming that it is indeed possible, my first thought was to construct an information matrix which will do what I am attempting to do, e.g. apply much higher 'trust' weighting to the roll and pitch measurement while optimising the remaining state variables. Is this a bad / poorly conceived idea or could it work?

What I have done so far:

I have determined the ORB-SLAM world scale and initial rotation post initialisation and appropriately scaled and rotated the autopilot data into this coordinate system (short video here). I have then experimented by replacing the velocity model with the autopilot inter-frame translation and rotation (which has been scaled, rotated and translated into the ORB-SLAM frame). In this experiment I am effectively feeding the algorithm the GPS aided EKF 'correct' solution, however it still diverges exactly as per the standard linear velocity model. This is because it only uses these values as a starting point for the optimisation process and ultimately determines what it considers to be the 'correct' pose, loosing rack of which way 'down' is in the process.

Does anyone have thoughts on if it is possible to constrain roll and pitch during the optimisation process and if so what is the best method to do so?

PS. I have also (poorly) implemented some python bindings using boost::python which you can find here. They are written to work with my hacked branch or ORB-SLAM2 but might make a good starting point if someone was looking to do something similar.

Thanks! Sam

laxnpander commented 7 years ago

Hey again :) Definitely cool project you are working on! I can't help you in detail as you might remember I am orb slam noob. But maybe I can help you explain drift. As you use monocular camera your depth information per measurement is not independent and reconstructed by considering previous frames (this is bad). You have a short stereo baseline for triangulation, in general few observations per mappoint and therefore a high uncertainty of your estimated altitude. I don't see too much chances to improve this by only fixing rotations as your scenario is worst case for visual slam. From a metrology point of few you might have chances by improving your calibration procedure / optical system. Better triangulated points mean lower tendency for drift. This also applies with higher frame rates and more observations per point btw. Additionally fixing the rotations does not sound 100% right to me, as you destroy the independance of your measurement. You might think about fusing your IMU and Orb Slam data instead by getting both into an EKF. So far from my point of view. You should definitely try your approach as these are only assumptions. But you should be aware that your general prerequisites are quite difficult for every variation of bundle adjustment.

SamuelDudley commented 7 years ago

Hey, Thanks for the input :+1: I understand the scale drift issue as you mentioned and that my dataset is operating in worst case conditions (nadir camera observing a planar scene)... However, I feel like what I am observing is more that just scale drift... If you have a look at the screen captures you will see that ORB-SLAM did a pretty good job of estimating the height of the camera above the local scene. That is to say the distance between the map points and the camera position remains quite constant during the ~50km straight leg.

The 'bowling' I'm observing seems to be heavily related to the camera parameters. By manually tweaking the camera calibration values I can cause the ORB-WORLD to bowl down rather than up, which leads me to believe that the effect could be reduced with a better calibration. However quite a bit of effort was put into this calibration so I don't feel like this is a viable option to drastically improve the 'bowling' issue.

This aircraft was carrying 4 cameras and by using a sideways facing camera (rather than the nadir) the yaw tracking performance is reduced (as expected), while the world 'bowling' effect is somewhat reduced (because there is greater depth information in the scene?). This leads me to believe that the problem would be reduced if I were to modify the code to accept images from a nadir 180 degree fish eye lens, however I would like to focus on incorporating the autopilots accurate roll and pitch estimate as a first effort.

screenshot from 2017-01-25 11 04 11 This is the exact same flight as the first post but with ORB-SLAM running on a sideways looking camera (rather than nadir)

So while the local scene height, scale and total odometry estimate are okay for the nadir camera (within reason), the attitude of the camera is free to 'run away' causing the world 'bowling'. I think using the word 'fix' was misleading as I don't want to fix the camera rotations, but rather have ORB-SLAM heavily favour adjusting the world points and camera translation over modifying the camera attitude. This is because the autopilot is capable of supplying drift free roll and pitch estimates from its own internal EKF, which will always be more accurate than the visual slam attitude estimate.

SamuelDudley commented 7 years ago

In case anyone is attempting to do the same I will be implementing the recommended solution here. However, I will supply larger values for the orientation elements of the information matrix and small values for the position estimates. If anyone has any further suggestions feel free to chime in, but for now I'll close this issue.

szymonkulpinski commented 5 years ago

Hi,

I am experiencing the same 'bowling' issue in my application, which is also nadir camera observing a planar scene. Did you manage to solve it? Unfortunately I don't have access to reliable GPS data and have to try to solve it by adjusting the parameters. The only upside is that i don't have to do it in "real time". Do you have any idea how I could get rid of that drift or at least minimize it?

Thank you in advance!

laxnpander commented 5 years ago

@szymonkulpinski: I experienced the same effect later after my first initial post here. If I remember correctly it was an issue of camera parameters as Samuel also suggested. So maybe you have the option to identify your camera parameters better and then rerun it?

szymonkulpinski commented 5 years ago

@laxnpander: You were right. I adjusted the camera parameters and now it works well. Thank you!

RoyAmoyal commented 5 months ago

@szymonkulpinski @laxnpander @SamuelDudley

Hey, I am trying to work on a similar project. If I understood you correctly, you use the Monocular version of ORB-SLAM2 and initialize the scale of the attitude of the drone (because it is the distance of the camera looking down to the scene, with the assumption that the scene is planar).

Why can't we use the height attitude information of the drone (using lidar, radar, etc), to use the RGBD version of the ORB-SLAM2? Isn't it supposed to solve the scale drift of the monocular camera? (Ofc, the planar scene assumption will cause another drift, but if the drone is flying high enough and not above "high" objects, It's supposed to be fine, I guess).

I will be happy to hear your thoughts, Thanks!

laxnpander commented 5 months ago

@RoyAmoyal Don't quite understand what you say. LiDAR as in 1D-LiDAR? Height information is not sufficient to use RGBD, for RGBD you'd need dense scene information. There is LiDARs for that (e.g. Livox), but only recently in a small enough form factor. You can of course use height measurements to deal with scale drift, but it's not straightforward to implement as you have to incorporate it in the factor graph and it's not really useful for a lot of other applications, because of the strong assumption that scene depth = height.

RoyAmoyal commented 5 months ago

@RoyAmoyal Don't quite understand what you say. LiDAR as in 1D-LiDAR? Height information is not sufficient to use RGBD, for RGBD you'd need dense scene information. There is LiDARs for that (e.g. Livox), but only recently in a small enough form factor. You can of course use height measurements to deal with scale drift, but it's not straightforward to implement as you have to incorporate it in the factor graph and it's not really useful for a lot of other applications, because of the strong assumption that scene depth = height.

Yeah you're right, I will explain again. If I have a rangefinder (1D Lidar, Radar, etc) on a drone that can extract the height of the drone to the ground and a camera on the drone looking to the ground, I can make a "relaxation" to the problem with the next conditions:

The drone is flying high enough (let's say 300 meters+ just for the example).
The object's heights in the scene (in the camera frames) are less than 10 meters for example.

then, I can assume that the scene in the camera frame is planar,because the relative heights of the objects in the scene compared to the camera height to the objects is more meaningless.

So if the scene is planar, I can assume that all the pixels in the frames have the same depth value that is equal to the distance from the camera to the ground (the attitude that we can get from the rangefinder/1D Lidar, etc). Of course, I am assuming for a moment that the camera is parallel to the ground (I can deal with the pitch and roll later).

In those conditions, can't I use the RGBD version of ORB-SLAM2, and for each frame, provide a depth channel (in which all the pixels have the same value) that is equal to the height attitude of the drone (for each frame, it can be a different depth, that equals the current attitude)?

and as I asked in the previous comment: Isn't it supposed to help the scale drift of the monocular camera? (The planar scene assumption probably will cause another accumulator error, but maybe it is relatively smaller than the global scale assumption drift).

Thanks

laxnpander commented 5 months ago

Ah, I see. In theory it sounds somewhat possible, but that is a VERY hacky approach. You may get away with it, but more likely it will fall apart somewhere along the way. You make a lot of assumptions and all of those will be slightly off, accumulating to a significant error that orb slam will not know how to compensate. Try it, my guess is it will not work well enough.

raulmur / ORB_SLAM2