raulmur / ORB_SLAM2

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Other
9.29k stars 4.69k forks source link

Scale issue #478

Open romdos opened 6 years ago

romdos commented 6 years ago

I'm not quite friendly with the "scale" issue in monocular methods. I've reviewed issues in this repository related to the scale and known that the scale is randomly initialized every launching. But what does it mean? I am aware that it's impossible to recover a map scale using only one camera. And, as for me, it is due to unknown camera replacing. In this method (according to the paper) in map initialization we detect sufficient parallax (how? how large it must be?) and do triangulation. Then (in local mapping) we must calculate new poses and add new points into the map via triangulation. So, we have to know the replacement length, right? As far as I know, we can estimate only unit vector of camera displacement from two views. If we make it random triangulated points may lie very far (or close). Correct me if I am wrong, the bundle adjustment procedure refines such strange cases of triangulation and makes map consistent. Sorry for my unawareness in some visual mapping issues.

AlejandroSilvestri commented 6 years ago

Hi @romdos

ORB-SLAM2 arbitrarily define scale at initialization, as the median scene depth. In practice, scale is different every time you initialize orb-slam2. It doesn't mean it is purposely random.

Map initializes with enough parallax. Parallax is about angles, not distances. When you initialize with two frames, displacement use to be unitary, but orb-slam2 change this taking the unit from the median depth of the recently created map. This initialization is a PnP problem.

Visual SLAM triangulates new points every frame. It first calculate the frame pose, then triangulate new points. In visual slam displacements aren't unitary like it happens in PnP.

romdos commented 6 years ago

Not clear with recently created map, if we just launch it immediately after installation. As I understand, there is a parallax threshold which tells us when to start initialize. Under initialization I mean that we triangulated correspondences since we know the 6 DoF of camera replacement (assuming translation length to be 1).

Visual SLAM triangulates new points every frame. It first calculate the frame pose, ...

How? As I noted there is constant velocity motion model. What is this? Does the velocity have fixed value?

AlejandroSilvestri commented 6 years ago

@romdos

ORB-SLAM2 repeatedly tries to initialize until a quality check is passed. First pose is calculated in the form of F fundamental matrix, or H homography matrix, Second points are triangulated. Several criteria are evaluated, one of them is parallax check: cosParallax<0.99998 to keep triangulated points.

Visual SLAM tries to get the camera pose based on know 3D position of visualized points. To achieve performance, orb-slam2 need a good initial camera pose estimation: thus the motion model.

Constant velocity model means orb-slam2 will assume as initial pose estimation as if the camera was moving in the same direction and velocity as it were in the last two frames.

Shubhamvithalani commented 5 years ago

@AlejandroSilvestri In the case of Monocular Camera, if the scale is arbitrary(different in every execution). Can we somehow know the scale of every initialisation??? Like can we get the scale factor.

poine commented 5 years ago

you need another sensor to evaluate the scale. I have successfully initialized absolutely scaled orbslam maps using my robot's odometers.

On Tue, Mar 12, 2019 at 11:23 AM Shubhamvithalani notifications@github.com wrote:

@AlejandroSilvestri https://github.com/AlejandroSilvestri In the case of Monocular Camera, if the scale is arbitrary(different in every execution). Can we somehow know the scale of every initialisation???

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/raulmur/ORB_SLAM2/issues/478#issuecomment-471942031, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGTz-du_De3LJAImxjzbcEun_gCI2BQks5vV4A5gaJpZM4QqMEm .

Shubhamvithalani commented 5 years ago

So you take the output from sensor(odometry) in every initialisation for a fixed distance(possibly, like calibration) and determine the scale of that initialisation. And use the same for concurrent readings?

poine commented 5 years ago

when initializing the map (in Tracking.cc Tracking::CreateInitialMapMonocular()) instead of scaling the map by the invert of the median depth, i scale it by the length given by my odometry.

// Transform initial map if (m_motion_model.has_external_pose()) { TRACE_TRACKING(1, "I have external poses\n" << m_motion_model.get_recorded_ext_pose() << "\n" << m_motion_model.get_external_pose() << "\n"); double scale; _ComputeInitialMapScaleFromExtPoses(&scale); mpMap->Scale(scale); cv::Mat T_e2m(4, 4, CV_32F); _ComputeInitialMapTransform(&scale, T_e2m, m_motion_model.get_recorded_ext_pose(), m_motion_model.get_external_pose(), pKFini, pKFcur); mpMap->Transform(T_e2m); } else if (m_motion_model.has_external_velocity()) { TRACE_TRACKING(1, "I have an external velocity " << m_motion_model.get_external_velocity() << " "); double scale; _ComputeInitialMapScaleFromExtVel(&scale); mpMap->Scale(scale); } else { TRACE_TRACKING(1, "No external velocity, setting median depth to 1 "); float invMedianDepth = 1.0f/medianDepth; mpMap->Scale(invMedianDepth); }

On Tue, Mar 12, 2019 at 11:44 AM Shubhamvithalani notifications@github.com wrote:

So you take the output from sensor(odometry) in every initialisation for a fixed distance(possibly) and determine the scale of that initialisation. And use the same for concurrent readings?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

thien94 commented 5 years ago

Hi @poine, can you elaborate on the mpMap->Scale() function? I believe this is your own self-implemented function since I cannot find it in the code?

How do you change the scale of the whole map? And suppose the odometry data is not sufficient to calculate scale at the time the function CreateInitialMapMonocular() is called, or in case you need to change the scale at a later time, can you update the scale by calling mpMap->Scale(scale) at any time?

Thank you for your time.

anika001 commented 4 years ago

Hi @poine , @thien94 and @AlejandroSilvestri .

I am posting this comment cause I am working on correcting the scale factor on ORB SLAM2 mono by using the ground truth from odometry (the position provided by kitti datasets) My approach is as follow:

  1. run orb slam mono and save the map -> mapping mode
  2. run orb slam and load the saved mad -> localization mode

To correct the scale what i did is as follow

My questions: -> Is my approach correct I mean updating the keyframes while loading the map, is there other things to update -> is there any solution to make the trajectory more continous and smooth with local scale

@poine could you please explain more you approach and how you integrated the odometry data in the code

Thank you in advance for your help and your time

AlejandroSilvestri commented 4 years ago

@anika001

The problem isn't trivial. I believe Poine is applying odometry to initial map only. This will give you a result similar to your solution 1.

Your solution 2 is not the way to do it. In the paper Visual-Inertial Monocular SLAM with Map Reuse Raúl Mur Artal adds odometry measurements to the pose graph, so Bundle Adjustment corrects keyframe poses which minimize the overall error.

anika001 commented 4 years ago

@anika001

The problem isn't trivial. I believe Poine is applying odometry to initial map only. This will give you a result similar to your solution 1.

Your solution 2 is not the way to do it. In the paper Visual-Inertial Monocular SLAM with Map Reuse Raúl Mur Artal adds odometry measurements to the pose graph, so Bundle Adjustment corrects keyframe poses which minimize the overall error.

Thank you @AlejandroSilvestri for your response and time

vasilinast commented 3 years ago

Hi @romdos

ORB-SLAM2 arbitrarily define scale at initialization, as the median scene depth. In practice, scale is different every time you initialize orb-slam2. It doesn't mean it is purposely random.

Map initializes with enough parallax. Parallax is about angles, not distances. When you initialize with two frames, displacement use to be unitary, but orb-slam2 change this taking the unit from the median depth of the recently created map. This initialization is a PnP problem.

Visual SLAM triangulates new points every frame. It first calculate the frame pose, then triangulate new points. In visual slam displacements aren't unitary like it happens in PnP.

hi @AlejandroSilvestri

i am trying to do a mosaicing and therefore, i would like to know how to use the camera poses. But i am struggling to display the images in the correct position with respect to the first one. As far as i know their units are referred to the median depth.
i would like to ask you in what units the median depth is. I am not familiar with it in monocular mode. I ran monocular ORB-SLAM2 and a median depth of 14 was printed.

AlejandroSilvestri commented 3 years ago

@vasilinast

Hi, in monocular the map distance unit is virtual, of unknown length in real world. The whole map uses the same distance unit. May be the unit is not that median depth, I don't believe that really matters.

I'll describe a method to measure the virtual length unit in meters, you can do it in localization mode: put your camera on one spot, take note of its pose matrix, then go forward exactly 1 meter and take note of the new pose matrix. Matrix last column is a 4 dimension vector [x,y,z,1]T representing the camera position in the virtual world. The difference of those two vectors is a displacement vector in virtual units, and you know it corresponds to 1 meter.

PD: the matrix must be Twc. If you have Tcw, invert it: Twc = Tcw⁻¹. Twc last column is the position vector.

vasilinast commented 3 years ago

@vasilinast

Hi, in monocular the map distance unit is virtual, of unknown length in real world. The whole map uses the same distance unit. May be the unit is not that median depth, I don't believe that really matters.

I'll describe a method to measure the virtual length unit in meters, you can do it in localization mode: put your camera on one spot, take note of its pose matrix, then go forward exactly 1 meter and take note of the new pose matrix. Matrix last column is a 4 dimension vector [x,y,z,1]T representing the camera position in the virtual world. The difference between those two vectors is a displacement vector in virtual units, and you know it corresponds to 1 meter.

Thank you for your immediate reply. Appreciate it.

@vasilinast

Hi, in monocular the map distance unit is virtual, of unknown length in real world. The whole map uses the same distance unit. May be the unit is not that median depth, I don't believe that really matters.

I'll describe a method to measure the virtual length unit in meters, you can do it in localization mode: put your camera on one spot, take note of its pose matrix, then go forward exactly 1 meter and take note of the new pose matrix. Matrix last column is a 4 dimension vector [x,y,z,1]T representing the camera position in the virtual world. The difference of those two vectors is a displacement vector in virtual units, and you know it corresponds to 1 meter.

Thank you for your instant reply. Understood it. Your advice was very helpful. Thanks again.

RoyAmoyal commented 5 months ago

you need another sensor to evaluate the scale. I have successfully initialized absolutely scaled orbslam maps using my robot's odometers. On Tue, Mar 12, 2019 at 11:23 AM Shubhamvithalani @.***> wrote: @AlejandroSilvestri https://github.com/AlejandroSilvestri In the case of Monocular Camera, if the scale is arbitrary(different in every execution). Can we somehow know the scale of every initialisation??? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#478 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGTz-du_De3LJAImxjzbcEun_gCI2BQks5vV4A5gaJpZM4QqMEm .

Is it enough to use a known scale just in the initialization? will it find the next locations of the camera in the real world better like that?