rpng / open_vins

An open source platform for visual-inertial navigation research.
https://docs.openvins.com
GNU General Public License v3.0
2.13k stars 630 forks source link

Trajectory Alignment #227

Closed mhuzai closed 2 years ago

mhuzai commented 2 years ago

Hello,

When aligning the estimated trajectory with the ground truth trajectory, why is only position taken into account? Wouldn't taking the rotation into account help achieve a better alignment?

A related question I have is about estimating scale during alignment. Am I correct in understanding that monocular VIO cannot estimate true scale and scale has to be estimated during alignment?

Thank you!

WoosikLee2510 commented 2 years ago

1) we have multiple aligning methods, so could you be more specific about which code you are talking about? 2) Mono-vio(cam + imu) should be able to recover the scale. I think you are talking about Mono-vo (cam only)?

mhuzai commented 2 years ago
  1. I was talking about se3. I assumed that that was the best method to align trajectories. Is that not so? I see that se3_single aligns the rotation too, but at the same time it only looks at the first pose. Is there a method that both looks at the entire trajectory and does rotational alignment as well?
  2. Thank you for the clarification!
goldbattle commented 2 years ago

I recommend you take a look at this paper to clarify what exact alignment process we use. We have basically re-implemented this work exactly (which is a very nice summary): https://rpg.ifi.uzh.ch/docs/IROS18_Zhang.pdf

The transformation we are interested in finding is aligning the two frame of references of the two trajectories, not the individual poses at each time. You might be thinking of the hand-eye calibration problem which solves a similar issue of unknown sensor frames too.

mhuzai commented 2 years ago

I did read that before but am still confused about a few things (sorry, not a SLAM person).

If two trajectories are in the same frame of reference (e.g., both ground truth and the estimated trajectory are in the IMU's frame of reference), then aligning them means aligning the origin, correct? If so, I can see how that would involve a translation. As for rotation, is the idea that roll and pitch can be resolved but yaw can't because the gravity vector doesn't change, and so a yaw-only rotation has to be estimated too?

Another question I have is that according to Table 1 in the paper, visual-inertial requires a yaw-only rigid body transformation (4DoF). But when I use yaw-only with monocular VIO, I get worse results than se3. Why is that?

goldbattle commented 2 years ago

image

The frame transformation is really only needed to calculate the error between the yaw and positions. One could directly compute the error of the roll and pitch since VIO can recover those. You might get worst results, as you are doing less alignment, thus the trajectory can have higher error. It is hard to say, since both posyaw and se3 are trying to find the best alignment to minimize error over the whole trajectory.

mhuzai commented 2 years ago

Gotcha. What's confusing me now is that in the paper Zhang et al. used a yaw-only rigid body transformation for VINS-Mono. With monocular OpenVINS, yaw-only yields a worse result than se3. Is there some intuition as to why that may be?

goldbattle commented 2 years ago

I would hypothesis that this is likely due to poor roll and pitch estimates.

mhuzai commented 2 years ago

Thank you!