tum-vision / lsd_slam

LSD-SLAM
GNU General Public License v3.0
2.59k stars 1.23k forks source link

Access to final poses calculated by LSD-SLAM #183

Open aggarwal-himanshu opened 8 years ago

aggarwal-himanshu commented 8 years ago

Hi there, Somebody please help me on how to get the final camera poses(one obtained after all the optimizations) calculated by LSD-SLAM?

Regards.

avi-jit commented 8 years ago

camToWorld.data() from KeyFrameDisplay.cpp is perhaps what you're looking for. Refer to its different elements such as camToWorld.data()[0], camToWorld.data()[1], camToWorld.data()[2] for x,y,z co-ordinates; 4-7 for quaternions (orientation) and (i guess) 15th or 16th value for scale.

Still figuring a way to use the scale value. Do let me know if you find something.

kitizz commented 8 years ago

When sequence is finished, or you've told the slam system to stop tracking, do final optimizations:

slam_system->optimizeGraph();
slam_system->finalize();

Then to read out the poses, you can do the following. The key parts are getAllPoses() and pose->getCamToWorld(2) which provide the scale transformation matrix [ sR, t ]

    auto poses = slam_system_->getAllPoses();
    int N = poses.size();
    int R = 3, C = 4;
    int i = 0;

    std::string posePath = output_path + "/lsd_pose.dat";
    std::cout << "Writing transforms to:" << posePath << "\n";

    std::ofstream out;
    out.open(posePath.c_str(), std::ios::out | std::ios::binary);
    out.write((char*) &N, sizeof(int));
    out.write((char*) &R, sizeof(int));
    out.write((char*) &C, sizeof(int));

    for (FramePoseStruct* pose: poses) {
        Sim3 T = pose->getCamToWorld(10);
        auto mat = T.matrix3x4();
        for (int i=0; i<3; ++i) {
            for (int j=0; j<4; ++j) {
                double val = mat(i,j);
                out.write((char*) &val, sizeof(double));
            }
        }
    }

    out.close();

I'm just being lazy and copy pasting my own code here, but hopefully you get the gist. This dumps a binary of the all the frames to a file where the first 12 bytes of data describe the number of frames and transformation matrix size, then the rest is the raw float (double) data.

My python function for reading them out in local transforms (for projecting world points into the image) loads them into a 3D numpy array:

def read_lsdslam_poses(path):
    with open(path, 'rb') as file:
        ba = np.array(bytearray(file.read()))

    F, I, J = ba[:12].view(np.int32)
    assert I == 3 and J == 4, "Not poses?"

    T = ba[12:].view(float).reshape(F,I,J)

    Rs = np.empty((F,3,3))
    ts = np.empty((F,3))
    for f in range(F):
        R = T[f,:,:3]
        s = np.linalg.norm(R[0])
        R /= s
        Rs[f] = R.T
        ts[f] = -dot(R.T, T[f,:,3])

    return Rs, ts
aggarwal-himanshu commented 8 years ago

Thank you so much. This should be very helpful :)

On Friday 8 July 2016, Kit Ham notifications@github.com wrote:

When sequence is finished, or you've told the slam system to stop tracking, do final optimizations:

slam_system->optimizeGraph(); slam_system->finalize();

Then to read out the poses, you can do the following. The key parts are getAllPoses() and pose->getCamToWorld(2) which provide the scale transformation matrix [ sR, t ]

auto poses = slam_system_->getAllPoses();
int N = poses.size();
int R = 3, C = 4;
int i = 0;

std::string posePath = output_path + "/lsd_pose.dat";
std::cout << "Writing transforms to:" << posePath << "\n";

std::ofstream out;
out.open(posePath.c_str(), std::ios::out | std::ios::binary);
out.write((char*) &N, sizeof(int));
out.write((char*) &R, sizeof(int));
out.write((char*) &C, sizeof(int));

for (FramePoseStruct* pose: poses) {
    Sim3 T = pose->getCamToWorld(10);
    auto mat = T.matrix3x4();
    for (int i=0; i<3; ++i) {
        for (int j=0; j<4; ++j) {
            double val = mat(i,j);
            out.write((char*) &val, sizeof(double));
        }
    }
}

out.close();

I'm just being lazy and copy pasting my own code here, but hopefully you get the gist. This dumps a binary of the all the frames to a file where the first 12 bytes of data describe the number of frames and transformation matrix size, then the rest is the raw float[32] data.

My python function for reading them out in local transforms (for projecting world points into the image) loads them into a 3D numpy array:

def read_lsdslam_poses(path): with open(path, 'rb') as file: ba = np.array(bytearray(file.read()))

F, I, J = ba[:12].view(np.int32)
assert I == 3 and J == 4, "Not poses?"

T = ba[12:].view(float).reshape(F,I,J)

Rs = np.empty((F,3,3))
ts = np.empty((F,3))
for f in range(F):
    R = T[f,:,:3]
    s = np.linalg.norm(R[0])
    R /= s
    Rs[f] = R.T
    ts[f] = -dot(R.T, T[f,:,3])

return Rs, ts

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tum-vision/lsd_slam/issues/183#issuecomment-231338530, or mute the thread https://github.com/notifications/unsubscribe/AF_9OYo7DjQOmJomlH39gQTr_t683BPoks5qTjY1gaJpZM4Hu1B9 .

Himanshu Aggarwal

aggarwal-himanshu commented 8 years ago

Hi Kitizz! Do you know how I can plot the camera trajectory using these camtoworld poses? I'm getting confused on how to use the scale while plotting the trajectory.

kitizz commented 8 years ago

Ah yep, that's something that requires a bit of thinking about. I encourage you to sit down and do the maths yourself, but I'll give you the "punchline" to work towards.

The scale is only for the depth maps not the camera location, and is key to helping LSD slam be large scale. An important fact to latch onto is that the scale output is only needed to reconstruct the scene, but not the camera poses (even though the estimation for the camera poses relies on the scale estimation).

The normalized rotation and translation, however, gives you the global transformation of the camera.

The rotation matrix is normalized by dividing by the magnitude of any row.

To visualize a frustum, you start with local coords: center at (0,0,0) and four points at z=1

Say that matrix of those points is P, a (3×5) matrix. The global locations of those points are: RP + t

It can be a bit to wrap your head around, but now that you know the end point, you should be able to play around with it on paper to get a good understanding for what's going on =)

On Fri, 22 Jul 2016, 1:35 PM Himanshu aggarwal notifications@github.com wrote:

Hi Kitizz! Do you know how I can plot the camera trajectory using these camtoworld poses? I'm getting confused on how to use the scale while plotting the trajectory.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tum-vision/lsd_slam/issues/183#issuecomment-234446708, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQ9JGd9B3uT2-WUa-MYhYPN10HU3HY6ks5qYDpngaJpZM4Hu1B9 .

aggarwal-himanshu commented 8 years ago

Thanks Kit for an awesome answer. So, if I understand correctly, to plot only the camera centers, all I have to do is plot only the translation matrix. If I need to demonstrate the orientation of the camera by a funnel, then only the rotation comes into picture. Am I right? This scale thing was so confusing. I couldn't help but think that scale must be incorporated to plot the trajectory. Thanks for clearing that up :D

aggarwal-himanshu commented 8 years ago

One more thing is that lsd slam keeps crashing regularly. In that scenario how can I retrieve the poses?

LiangliangGuo commented 6 years ago

Hi,aggarwal-himanshu! I want to know if you have realised to get the final camera poses?if you have done, can you give me some help?Many thanks! Regard!

JeffreyYH commented 6 years ago

Thanks for your excellent answer @kitizz. I came across a problem when compiling the code you provided: error: invalid use of incomplete type ‘class lsd_slam::FramePoseStruct’ Sim3 T = pose->getCamToWorld(10); error: forward declaration of ‘class lsd_slam::FramePoseStruct’ class FramePoseStruct; Did you have similar error before? Thanks a lot!

alexis-dupuis commented 6 years ago

@JeffreyYafeiHu I know this topic is a bit old, but for anyone still interested in the answer to that... you just have to include the following header in the .cpp where you used kitizz's code:

include "DataStructures/FramePoseStruct.h"