Closed jianweif closed 3 years ago
Hello,
Thank you for pointing this out! Our model is indeed outputting rotation vectors, and the comparison on the notebooks should first be converted to Euler angles. I will update the evaluation notebooks to reflect this change. Thanks!
Hi,
there is still the problem that the euler angle convention and coordinate system convention does not match the convention used in AFLW2000-3D. This can lead to different errors for yaw, pitch and roll even when using the same convention for groundtruth and estimate. Usually the convention in AFLW2000-3D is used, making your results not comparable to literature.
When looking at the person from the front as an observer, in the AFLW2000-3D coordinate system the X-Axis goes to the right, the Y-Axis to the top and the Z-Axis towards the observer. In your coordinate system (OpenCV) the Y- and Z-Axis are inverted. Additionally the euler (tait-bryan) convention in AFLW2000-3D is 'XYZ' instrinsic rotations (and counter clockwise rotation).
So to convert your rotvec to AFLW2000-3D euler convention you need to use the function you already provided in https://github.com/vitoralbiero/img2pose/issues/17#issuecomment-794624478 (convert_to_aflw). But then use the return from this function directly and not transform them again. The errors will be different to your current ones.
Ideally you would additionally use the groundtruth from AFLW2000-3D (pose_para) instead of you own groundtruth.
Hello @KarlKulator,
Thank you for the suggestion.
As our model is not constrained by yaw angles of (-90, 90), the AFLW2000-3D convention fails to accurately measure our errors. One example is the image below, where the error is qualitatively small in visualization, small on zxy rotation angles, but large on xyz rotation angles, due to yaw been predicted a little above 90 degrees.
Nevertheless, in response to your question, we released a model trained with constrained yaw poses and updated the evaluation notebooks to use the AFLW2000-3D ground-truth (pose_para) with its standard convention (xyz).
The updated version obtains state-of-the-art accuracy when measured in the AFLW2000-3D representation (in fact, pose estimation results actually improved in some cases). We will update the information in the camera-ready version of our CVPR'21 paper, and arXiv accordingly.
Hi,
Thanks for your code. I was checking your notebook biwi_evaluation.ipynb and found that the ground truth and predictions are both in rotation vector representation (3d vector representing rotation axis and norm representing rotation angle). However, when you print out the errors, you call it error on Yaw, Pitch and Roll. But it should be errors on rotation vector?
Evidence of ground truth being rotation vector:
Here it's clear that the pose_target is first a rotation matrix and then converted to rotation vector by as_rotvec(). To convert to Pitch, Yaw, Roll it should be
Thanks!