How did you evaluate TUM using translational rmse(m/s)

princeton-vl / DeepV2D

BSD 3-Clause "New" or "Revised" License

651 stars 93 forks source link

How did you evaluate TUM using translational rmse(m/s) #9

Open eric-yyjau opened 4 years ago

eric-yyjau commented 4 years ago

Hi, thank you for your nice work. I'm wondering how you get the results from the paper. I ran the code by

python demos/demo_slam.py --dataset=tum

Extract the poses from slam.poses. Then, I use evo_rpe for evaluation. But the metrics from evo_rpe is

{"title": "RPE w.r.t. translation part (m)\nfor delta = 1 (frames) using consecutive pairs\n(with Sim(3) Umeyama alignment)", "ref_name": "DeepV2D/data/slam/tum/rgbd_dataset_freiburg1_room/groundtruth.txt", "est_name": "DeepV2D/results/tum/poses.tum", "label": "RPE (m)"}

The aligned trajectory also doesn't look right. May I ask if there's some conversion I missed?

Thank you.

zachteed commented 4 years ago

It looks like you are plotting the translation component of the camera poses. The poses tell how 3D points get mapped to the image. To get the world coordinates of the camera, you need to invert the poses to convert them to c2w format. Also, DeepV2D only estimates depth up to a scale factor, so you will also need to scale the trajectory.

The results in the table are evaluated using the online submission https://vision.in.tum.de/data/datasets/rgbd-dataset/online_evaluation. Note that in that table, both our method and DeepTAM are evaluated using sensor depth as input to test motion estimation in isolation.

eric-yyjau commented 4 years ago

Hi, thank you for your timely reply. Yeah, it is a lot better after the inversion.

How do we correct the scale? Manually?
Have you tried using the depth predicted by the module?

zachteed commented 4 years ago

You can get a scale correction by doing scale = np.sum(gtruth_xyz * pred_xyz)/np.sum(pred_xyz ** 2) if gtruth_xyz and pred_xyz are the (x, y, z) predicted/gt coordinates of the camera over the full trajectory.

However, right now slam.poses only stores the poses for the keyframes. To evaluate rpe, you will need poses for all the frames.