vitoralbiero / img2pose

The official PyTorch implementation of img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation - CVPR 2021
Other
588 stars 109 forks source link

Is Prediction Rotation Vector or Euler Angles (yaw, pitch roll)? #15

Closed jianweif closed 3 years ago

jianweif commented 3 years ago

Hi,

Thanks for your code. I was checking your notebook biwi_evaluation.ipynb and found that the ground truth and predictions are both in rotation vector representation (3d vector representing rotation axis and norm representing rotation angle). However, when you print out the errors, you call it error on Yaw, Pitch and Roll. But it should be errors on rotation vector?

Evidence of ground truth being rotation vector:

    img_path, pitch, yaw, roll = sample
    pitch = float(pitch)
    yaw = float(yaw)
    roll = float(roll)

    annotations = open(img_path.replace("_rgb.png", "_pose.txt"))
    lines = annotations.readlines()

    pose_target = []
    for i in range(3):
        lines[i] = str(lines[i].rstrip("\n")) 
        pose_target.append(lines[i].split(" ")[:3])

    pose_target = np.asarray(pose_target)       
    pose_target = Rotation.from_matrix(pose_target).as_rotvec()        
    pose_targets.append(pose_target)

Here it's clear that the pose_target is first a rotation matrix and then converted to rotation vector by as_rotvec(). To convert to Pitch, Yaw, Roll it should be

    pose_target = Rotation.from_matrix(pose_target).as_euler('xyz', degrees=False) # in order pitch, yaw, roll

Thanks!

vitoralbiero commented 3 years ago

Hello,

Thank you for pointing this out! Our model is indeed outputting rotation vectors, and the comparison on the notebooks should first be converted to Euler angles. I will update the evaluation notebooks to reflect this change. Thanks!

KarlKulator commented 3 years ago

Hi,

there is still the problem that the euler angle convention and coordinate system convention does not match the convention used in AFLW2000-3D. This can lead to different errors for yaw, pitch and roll even when using the same convention for groundtruth and estimate. Usually the convention in AFLW2000-3D is used, making your results not comparable to literature.

When looking at the person from the front as an observer, in the AFLW2000-3D coordinate system the X-Axis goes to the right, the Y-Axis to the top and the Z-Axis towards the observer. In your coordinate system (OpenCV) the Y- and Z-Axis are inverted. Additionally the euler (tait-bryan) convention in AFLW2000-3D is 'XYZ' instrinsic rotations (and counter clockwise rotation).

So to convert your rotvec to AFLW2000-3D euler convention you need to use the function you already provided in https://github.com/vitoralbiero/img2pose/issues/17#issuecomment-794624478 (convert_to_aflw). But then use the return from this function directly and not transform them again. The errors will be different to your current ones.

Ideally you would additionally use the groundtruth from AFLW2000-3D (pose_para) instead of you own groundtruth.

vitoralbiero commented 3 years ago

Hello @KarlKulator,

Thank you for the suggestion.

As our model is not constrained by yaw angles of (-90, 90), the AFLW2000-3D convention fails to accurately measure our errors. One example is the image below, where the error is qualitatively small in visualization, small on zxy rotation angles, but large on xyz rotation angles, due to yaw been predicted a little above 90 degrees.

example_xyz_zxy

Nevertheless, in response to your question, we released a model trained with constrained yaw poses and updated the evaluation notebooks to use the AFLW2000-3D ground-truth (pose_para) with its standard convention (xyz).

The updated version obtains state-of-the-art accuracy when measured in the AFLW2000-3D representation (in fact, pose estimation results actually improved in some cases). We will update the information in the camera-ready version of our CVPR'21 paper, and arXiv accordingly.