Test this algorithm on TUM SLAM data

echo-gui commented 2 years ago

Hi, I try to test LDSO with the dataset SLAM for Omnidirectional Cameras, which is a very valuable dataset. Is there any chance that you could help with this issue? I am trying to use the dataset on https://vision.in.tum.de/data/datasets/omni-lsdslam, but the timestamps on the ground truth file are different from the images name. You may have a look at the following picture, the left part is the images' name, and the right part is the ground truth file, which starts with 0.00000 and 0.01000 ... at every row.

Screenshot from 2022-03-28 22-01-50

I wonder how to align the ground truth timestamps with the images because the trajectory our algorithm calculated is according to the image timestamp. Following are the trajectories for our method and the ground truth on the T1, and we try to align our method with the orb-slam2 mono, which seems to be fine. So we really want to know how to compare with the ground truth.

The time stamp on the ground truth file start with 0.0000, so I add the time stamp on the ground truth file with the start timestamp of the images, then I got this result. Here is our method comparing with ground truth, which seems to be very weird.

Any suggestion will be appreciated! Have a nice day. @NikolausDemmel

NikolausDemmel commented 2 years ago

The GT is unsynchronized, unfortunately, so some kind of temporal alignment (and maybe also tracking-frame alignment == hand-eye-calibration) is needed on top of the world frame alignment. I'm including below some responses from the original author (David Caruso) from an email discussion from a while back. Hope it helps.

Lastly, each scene has a directory for the images (i.e "T1/images"). Sadly, each image is named with a timestamp which doesn't correspond to any timestamp in the GT txt file. Neither a pairing list nor some information about how to match each image with a GT location is provided (as in the TUM RGBD dataset does)

[DC] At that time we did not have any way to synchronize the mocap with the camera. The delay (and maybe scale factor, i am not sure anymore) between the two clocks were estimated simultaneously with trajectory alignment in post processing. (Btw, errors displayed in papers are thus actually a lower-bound on the real error, this is generally the case when the trajectories are rigidly aligned in post-processing anyway)

Please, I would like to know how to deploy this Dataset properly, please could you guide me?

[DC] You should be able to easily use the image stream in your SLAM system thanks to the provided calibration files. Once you have a trajectory, you can register the ground-truth trajectory to your estimated trajectory while simultaneously estimating the clock scale factors / offset. If you want to be more accurate, offline-sfm can be used for the alignment part, but it is likely in this case that you would like to calibrate the camera yourself. ... which is not easy to do as the calibration sequences were lost at some point, so you will need to do online calibration. Admittedly, all of this is a bit cumbersome.

[FOLLOWUP] I’m actually curious here, how exactly you implemented that joint alignment of world-frame and time-offset. Did you use some existing (open source) tool, or did you write it your self as part of the evaluation code? I think its not so trivial to do this properly, since the problem is non-convex.

[DC] If I remember well, all the alignment was relying on a carefully monitored matlab's fminsearch usage... This is not satisfying from a "provable algorithm" point of view but it empirically gave reliable enough results for our purpose. (By reliable I mean that the time delays that were estimated were consistent and reproducible enough when using trajectories from several LSD-SLAM/ORB-SLAM runs.)

echo-gui commented 2 years ago

Thanks for all the information! I have been doing this by myself, and it is really nice of you to provide so many information.

Well, it is no easy job on aligning the time sequence and the image. For now, I would rather choose another dataset. If I got the time and chance to solve the align problem, I will post the solution here. :)

NikolausDemmel commented 2 years ago

Yes, it's not easy to do.

Have you seen the TUMVI dataset? https://vision.in.tum.de/data/datasets/visual-inertial-dataset

It has a similar lens, calibration sequences that work with Kalibr and Basalt (so you can calibrate any model you want), and synchronized ground truth from motion capture. Most sequences have only GT at the start and end (so you can evaluate odometry drift). The "room" sequences have full GT (with a few small gaps), but motion is of course limited since it's a single room.

echo-gui commented 2 years ago

I just have a look at the TUMVI dataset and the related paper.

I might need to synchronize the image timestamp with the imu_gt using the cam_time_offset_ns on the image above, and the T_i_c0, T_i_c1 to get the frame pose gt from imu_gt. Does the camera time offset have the same value in all the sequences?

NikolausDemmel commented 2 years ago

Use the calibrated files. Those have the timeoffset already taken into account. So IMU, GT and cameras are all in the same timeframe.

GT is w.r.t. IMU (Tgt_world_imu), so you are right, for example for the left camera you can get GT in that frame by computing

Tgt_world_cam0 = Tgt_world_imu * T_i_c0

where T_i_c0 comes from the calibration. It should be the same for all sequences.

Note that timestamps of GT are at 120 Hz frequency and don't correspond exactly to camera or IMU timestamps, so you have to associate poses by interpolation or nearest neighbor (but be wary of the small holes in GT, so there should be a max-time-difference when associating poses). Existing tools for evaluation with ATE usually have such an association step built in.

echo-gui commented 2 years ago

I got your point and I will try with it :) You could close this issue.

tum-vision / LDSO

Test this algorithm on TUM SLAM data #67