nianticlabs / monodepth2

[ICCV 2019] Monocular depth estimation from a single image
Other
4.16k stars 962 forks source link

Questions about evaluating poses. #143

Open zhongcl-thu opened 4 years ago

zhongcl-thu commented 4 years ago

Hi, I have some questions about comparing gt pose and predicted pose.

In evaluate_pose.py,

Line119-121

for i in range(0, num_frames - 1):
        local_xyzs = np.array(dump_xyz(pred_poses[i:i + track_length - 1]))
        gt_local_xyzs = np.array(dump_xyz(gt_local_poses[i:i + track_length - 1]))

The poses in pred_poses and gt_localposes mean the transformation from (t-1)th frame to (t) frame, which can be denoted as $T{t t-1}$ or $T_{t-1->t}$. But i think the error happens when they are fed into the function dump_xyz:

def dump_xyz(source_to_target_transformations):
    xyzs = []
    cam_to_world = np.eye(4)
    xyzs.append(cam_to_world[:3, 3])
    for source_to_target_transformation in source_to_target_transformations:
        cam_to_world = np.dot(cam_to_world, source_to_target_transformation)
        xyzs.append(cam_to_world[:3, 3])
    return xyzs

This line cam_to_world = np.dot(cam_to_world, source_to_target_transformation) means $T{00}*T{10}$, and next, $T{00}*T{10}*T_{21}$..., I think it seems incorrect because you can't get the xyzs from that.

In my opinion, to get a correct result, we should inverse the pred_poses and gt_local_poses. Because the xyzs can be obtained by $T{01} = T{00}T{01}$, $T{02}=T_{01}T_{12}$...

Thanks.

mrharicot commented 4 years ago

Hi, I agree that this looks suspicious, have you tried swapping them?

zhongcl-thu commented 4 years ago

@mrharicot Thanks for your reply. Yeah, i did it. After I inverted the pred_poses and gt_local_poses, the evaluation result(ATE-5frames) of seq09 is still 0.018+0.010. But when I tested them with other's pred poses, the results of new codes are closer to other's reports than the results of your original codes.

daniyar-niantic commented 4 years ago

@mdfirman potential bug?

mdfirman commented 4 years ago

Interesting – thanks for the flag @wangjx123 . It's interesting too that the monodepth2 results stay the same.

Could you share please the code snippet of the change you made? Thanks.

daniyar-niantic commented 4 years ago

@wangjx123 Hi, could you share your code for pose computation please?

Beniko95J commented 3 years ago

Can this bug be fixed by passing invert=True in line 100?

mdfirman commented 3 years ago

@Beniko95J : Thanks, I tried this but it seemed to make all results worse

Overall the code should be a straightforward copy of the functionality in sfm-learner, but there's every chance that something has gone wrong in conversion to our codebase and formats.

@wangjx123 : Are you able to share the modifications you made? Either here or as a branch.

I've pushed a branch mfirman/pose_eval which plots some visualisations for pose evaluation which may be useful for us all trying to get to the bottom of this.

JonasKonrad commented 1 year ago

Hey all,

I have recently stumbled over this issue. After a thorough code review, I can say: The errors in the pose evaluation are to be neglected[^1]. Indeed, we should inverse the ego-motion transforms / relative poses if we want to correctly concatenate the trajectories with a multiplication from left to right as it's done in the code. We could, instead, change the order of the multiplication with the non-inverted matrices (which yields the inverse of the multiplication compared to the correct (inverted) transformations, i.e. inverted trajectories). This still results in the same numbers for the errors the way they're calculated (ATE). But, having said that, why can we neglect the error using the wrong order of multiplication? For small transformations $\mathbf{A}, \mathbf{B}, \mathbf{C} \in \mathrm{SE}(3)$ , let's call them delta transformations, the statement $\mathbf{A} \mathbf{B} \mathbf{C} \approx\mathbf{C} \mathbf{B} \mathbf{A}$ holds. Dealing with delta transformations, one has to consider: The error adds up though. I have played around with longer trajectories -- the evaluation uses 5 by default -- and indeed, for larger trajectory sizes, such as 20, the difference cannot be neglected anymore.

What a late reply on that issue, may this help a future user.

[^1]: With the given precision of calculation and rounding (0.3f in the end).