raulmur / ORB_SLAM2

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Other
9.38k stars 4.7k forks source link

coordinate system transform #226

Open jiuerbujie opened 7 years ago

jiuerbujie commented 7 years ago

Hi, all Sorry to ask a very basic question. I really need to know what a transform matrix means in orb_slam2. For example, the matrix Tcw, called camera pose, does it mean transform a point C in current frame's coordinate to world coordinate point W, or the other way. In other words, W = Tcw*C, or C = Tcw*W. When I read the code, I am confused.

hashten commented 7 years ago

Tcw means to convert a world point to camera, so that is: C = TcwW, and W = TwcC the transform is made up of: Tcw = [ Rcw | tcw ] (skipping the last row of [ 0 0 0 1 ]. The translation twc is a rotated negative of tcw: twc = - Rcw^T tcw so Twc = [ Rcw^T | twc ] = [ Rcw^T | -Rcw^T tcw ] (with last row still [ 0 0 0 1 ]. The function GetCameraCenter returns twc.

jiuerbujie commented 7 years ago

@hashten , thanks a lot! I am much clear now.

AlejandroSilvestri commented 7 years ago

@hashten , do you know what Tcw stands for? We know c stands for camera, w for world. But why Tcw and Twc in that order. I thought Tcw was "Camera pose in the World reference", but turns out it is exactly the opposite. Do you know any book that uses this notation? Couldn't find it in Hartley & Zisserman"Multiple view geometry". Thank you

xslittlegrass commented 6 years ago

@AlejandroSilvestri Tcw in that order because we usually read the transform from right to left, which will be easier to understand for a long position. For example:

p4 = T43*T32*T21*p1

transform the point from frame 1 to frame 4. Write that way will also help to prevent mistakes.

The data saved in SaveTrajectoryKITTI is transformation Twc, which is the transformation that takes a point in the camera coordinate system to the world coordinate system.

In Orb_SLAM2, the frame poses are stored as the relative transformations to keyframes (reference frames), so we need to concatinate the poses of the keyframe with this relative transform to get the frame poses. It will be more convinent if we calculate first the inverse of the frame pose:

Tfo = Tfr * Trw * Two

where Two: the transformation that converts the points in the origin coordinate system (first keyframe) to the world coordinate system Trw: the transformation that converts the points in the world coordinate system to the reference keyframe coordinate system Tfr: the transformation that converts the points in the reference coordinate system to the camera frame coordinate system

Then we inverse Tfo to get Tof ( camera pose in the coordinate system of the first keyframe, or transformation that converts point in camera frame to the coordinate system of the first keyframe).

Here are the code that does this:

cv::Mat Two = vpKFs[0]->GetPoseInverse();
Trw = Trw*pKF->GetPose()*Two; // get the pose of the reference keyframe
cv::Mat Tcw = (*lit)*Trw; // compose the transform to get the pose of the frame
cv::Mat Rwc = Tcw.rowRange(0,3).colRange(0,3).t();
cv::Mat twc = -Rwc*Tcw.rowRange(0,3).col(3);

The last two lines are calculating the inverse of the Tfo.

AlejandroSilvestri commented 6 years ago

@xslittlegrass , great explanation! Works for me.

Tomas-Lee commented 5 years ago

@AlejandroSilvestri Tcw in that order because we usually read the transform from right to left, which will be easier to understand for a long position. For example:

p4 = T43*T32*T21*p1

transform the point from frame 1 to frame 4. Write that way will also help to prevent mistakes.

The data saved in SaveTrajectoryKITTI is transformation Twc, which is the transformation that takes a point in the camera coordinate system to the world coordinate system.

In Orb_SLAM2, the frame poses are stored as the relative transformations to keyframes (reference frames), so we need to concatinate the poses of the keyframe with this relative transform to get the frame poses. It will be more convinent if we calculate first the inverse of the frame pose:

Tfo = Tfr * Trw * Two

where Two: the transformation that converts the points in the origin coordinate system (first keyframe) to the world coordinate system Trw: the transformation that converts the points in the world coordinate system to the reference keyframe coordinate system Tfr: the transformation that converts the points in the reference coordinate system to the camera frame coordinate system

Then we inverse Tfo to get Tof ( camera pose in the coordinate system of the first keyframe, or transformation that converts point in camera frame to the coordinate system of the first keyframe).

Here are the code that does this:

cv::Mat Two = vpKFs[0]->GetPoseInverse();
Trw = Trw*pKF->GetPose()*Two; // get the pose of the reference keyframe
cv::Mat Tcw = (*lit)*Trw; // compose the transform to get the pose of the frame
cv::Mat Rwc = Tcw.rowRange(0,3).colRange(0,3).t();
cv::Mat twc = -Rwc*Tcw.rowRange(0,3).col(3);

The last two lines are calculating the inverse of the Tfo.

@AlejandroSilvestri Eecuse me, I want to ask is : whatever, according to the notation of book "State Estimation for Robotics",in chapter6.3, the author give the definition that camera pose is the transform from camera to world. that's to say, Twc is the right camera pose, Tcw should be the inverse of camera pose. So Phd. raulmur 's comment in his project (src/KeyFrame.h, line 176 cv::Mat Tcw //SE3 pose and camera center) should be wrong, Right?

AlejandroSilvestri commented 5 years ago

@Tomas-Lee , I appreciate a lot these explanation. Helps to make perfect sense.

Here Raúl mention pose and camera center and declares this three:

I can't say Raul's comment has errors, it's just general.

Twc is a main output of the slam process. In the other hand, Tcw is very important during the process, when looking for mappoints in the view, because mappoints are expressed in the world reference but they need to be projected on the image, in the camera reference.

thohemp commented 5 years ago

@AlejandroSilvestri

How would a get the world pos from a pixel pos? As p = C[R|T]P, where p is the pixel value, C is the camera matrix (mK), [R|P] is Twc, P is world pos point. Shouldn't it work with P = Inv(mK)Tcwp?

AlejandroSilvestri commented 5 years ago

@thohemp ,

Well, no, it doesn't work, C[R|T] isn't invertible, it's a 3x4 matrix, not a square one. p is a 2-dimensional point and P is a 3-dimensional one: no enough data in p to get P. (We are using homogeneous coordinates, so p is a 3-dimensional vector and P is a 4-dimensional one. In both cases the last element is a 1.)

Geometrically, you can backproject p as a ray, a line, so you won't know what distance P is from camera. To get P you need 2 views and intersect these rays.

thohemp commented 5 years ago

@AlejandroSilvestri Thanks for your answer.

I used p as (x,y,1). I multiply it with the mk.inv(), which gives me the camera coordinates (z remains 1) than I substract it with twc, the world position of the camera center. I receive the world coordinates. the distance (z) remains 1. But I can change it by using the depth value from the depthmap for p(x,y) as I use RGB-D.

In this case I negleted the rotation, because I didn't need it. I may receive it by multplying it with the rotation matrix^T.

The first answer gave me the hint: https://stackoverflow.com/questions/22389896/finding-the-real-world-coordinates-of-an-image-point

It seems to work.

AlejandroSilvestri commented 5 years ago

@thohemp , that's much better.

pxl: pixel 2d in homogeneous coordinates (x,y,1), in pixel units
pc = K^-1 p : point 3d in camera reference on projection plane, in world units
P'c = pc depth : point 3d in camera reference
Pc = [P'c | 1] : point 3d in camera reference, in homogeneous coordinates
Pw = Twc Pc : point 3d in world reference, in homogeneous coordinates

So

Pw = Twc [ (depth K^-1 pxl) | 1 ]
thohemp commented 5 years ago

@AlejandroSilvestri

Excellent. Thank you!

thohemp commented 5 years ago

@AlejandroSilvestri

In my case twc is a 3vs1 vector from the function GetCameraCenter, not an 4x4 as you use. In this case I would Pw = P'c-Twc. Right?

AlejandroSilvestri commented 5 years ago

@thohemp

Twc is a pose transformation matrix. twc is a translation vector. Not the same thing.

Tcw = keyframe.GetPose()
Twc = Tcw^-1

You also have

Twc = keyframe.GetPoseInverse()