shubham-goel / 4D-Humans

4DHumans: Reconstructing and Tracking Humans with Transformers
https://shubham-goel.github.io/4dhumans/
MIT License
1.25k stars 120 forks source link

perspective projection #125

Closed dengyang11 closed 5 months ago

dengyang11 commented 6 months ago

Hi, thanks for your wonderful work!

I am wondering that why the pred_cam_t[3] is

    pred_cam_t = torch.stack([pred_cam[:, 1],
                              pred_cam[:, 2],
                              2*focal_length[:, 0]/(self.cfg.MODEL.IMAGE_SIZE * pred_cam[:, 0] +1e-9)],dim=-1)

I look forward to your reply, Thanks

geopavlakos commented 5 months ago

In this case, the original pred_cam[:,0] value corresponds to s, the scaling factor of the weak perspective projection, which approximates f/Z. So the depth of the human is Z = f/s. Then, we also divide by the factor bbox_size/2, so that we project the human to [-0.5,0.5].

dengyang11 commented 5 months ago

Thanks again. In addition, why focal length changes with image size? Thanks

  1. scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()
  2. pred_keypoints_2d = perspective_projection(pred_keypoints_3d, translation=pred_cam_t, focal_length=focal_length / self.cfg.MODEL.IMAGE_SIZE)
geopavlakos commented 5 months ago

You can use an arbitrary focal length value when you use the above equation. We adopt the design decisions of ProHMR. Note that self.cfg.MODEL.IMAGE_SIZE is constant (set to 256). For the demo code, this is just a design choice to visualize the results with larger focal length values in general. You could experiment with other values too.

dengyang11 commented 5 months ago

Thanks again

nnop commented 4 months ago

Then, we also divide by the factor bbox_size/2, so that we project the human to [-0.5,0.5].

You mean normalize to [-1, 1]? @geopavlakos And I think it's more proper to normalize by bbox_size instead of image size. It's the bbox size which is resized to MODEL.IMAGE_SIZE.