Closed Dene33 closed 2 years ago
Hi @Dene33,
Here is the line where we construct the camera extrinsics transformation for rendering: https://github.com/mkocabas/SPEC/blob/d2fe2c264c72c98a5f479fc36f74bdd5f45427da/spec/utils/renderer_cam.py#L112. You can see the camera translation there.
Basically, CamCalib estimates the camera rotation and focal length, and then SPEC model estimates the camera translation wrt the human body. One can interpret this as the translation of the body wrt the camera as well. Latter one is more useful for multi-person cases when you want to assume that the camera is located at the origin and human body instances are translated wrt the camera. Hope this helps!
Hi! @mkocabas
I noticed that CamCalib can predict camera pose: cam_rotmat, cam_int, cam_vfov, cam_pitch, cam_roll, cam_focal_length. Q1: Is cam_rotmat caculated using cam_pitch and cam_roll? Q2: Why Yaw angle is not needed? Q3: why does cam_int mean? Why is it a 3x3 matrix?
Besides, from your smpl results (SPEC), I also noticed that there is pred_cam and pred_cam_t, Q1: What's the setting about pred_cam? Are they the euler angles defining how to rotate the mesh towards camera axis x?
I hope you can help me solve these questions. Thanks a lot!
Q1: Yes, it is. And here is the function where we convert pitch, roll -> cam_rotmat: SPEC/cam_params.py at d2fe2c264c72c98a5f479fc36f74bdd5f45427da · mkocabas/SPEC (github.com) Q2: Yaw angle is ill-posed to estimate from single images. We have horizon as a common reference to estimate roll and pitch, but there is no such reference to estimate yaw in that sense. Q3: It means the camera intrinsic parameters constructed as such: https://github.com/mkocabas/SPEC/blob/d2fe2c264c72c98a5f479fc36f74bdd5f45427da/spec/utils/cam_params.py#L39-43 Q4: They are the estimated camera translation, pred_cam is [s, tx, ty], pred_cam_t is [tx, ty, tz]. And here is how we convert from pred_cam to pred_cam_t: PARE/smpl_cam_head.py at master · mkocabas/PARE (github.com). Hence they are not related to camera rotation.
@mkocabas Thank you for your answer. I think i'm very close to understand all the details while still not fully understanding some details. I hope you can help me more! Here is one example here, I manually set the bbox cover the whole input image, the estimated pitch=-11.5, roll=-1.1, pred_cam_t = [1.768, 0.031, 0.1115]. I'm not sure about the relationships between. I vis the estimated 3d mesh, I notice that the root joint is very close to the origin, which coordinate does the 3d mesh in? How to convert it into camera coordinate? Assume 3d joints X, does RX+t convert it into camera coordinate?
Besides, how does the pitch and roll caculated? Are the settings same as that in the shown picture? I still don't figure out the setting of camera coordinate, does the axis Zc face towards the bbox center? SPEC estimates a horizon line, do you mean that the camera center is placed on this horizotal plane, then we should use pitch and roll to rotate camera coordinate to let axis Zc face the human body or the bbox center?
I hope you can help me, thanks a lot!
I'd like to know if it's possible to get a translation of the camera. I've found the code just for rotation and focal length estimation.
On this image https://github.com/mkocabas/SPEC/blob/master/docs/assets/spec_gif.gif the camera is placed in the world space somehow. How did you do this? Thanks.