hongsukchoi / 3DCrowdNet_RELEASE

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022
MIT License
155 stars 15 forks source link

About get_camera_trans in model.py #32

Open lllll8 opened 10 months ago

lllll8 commented 10 months ago

` def get_camera_trans(self, cam_param, meta_info, is_render):

camera translation

    t_xy = cam_param[:,:2]
    gamma = torch.sigmoid(cam_param[:,2]) # apply sigmoid to make it positive
    k_value = torch.FloatTensor([math.sqrt(cfg.focal[0]*cfg.focal[1]*cfg.camera_3d_size*cfg.camera_3d_size/(cfg.input_img_shape[0]*cfg.input_img_shape[1]))]).cuda().view(-1)
    if is_render:
        bbox = meta_info['bbox']
        k_value = k_value * math.sqrt(cfg.input_img_shape[0]*cfg.input_img_shape[1]) / (bbox[:, 2]*bbox[:, 3]).sqrt()
    t_z = k_value * gamma
    cam_trans = torch.cat((t_xy, t_z[:,None]),1)
    return cam_trans`

Hi, how to understand this function about the parameters including cam_param, gamma, k_value,t_z, cam_trans?

hongsukchoi commented 10 months ago

Hi, cam_param[:, :2] is the xy translation of the mesh in the camera coordinate, k_value is the absolute distance of the mesh from the camera using the perspective camera, cam_param[:, 2] is the learned correction factor that could improve the accuracy of k_value, t_z is the final absolute distance from the camera, and cam_trans is the xyz translation of the mesh in the camera coordinate.

For the theoritical background, please refer to this paper: https://arxiv.org/abs/1907.11346

lllll8 commented 10 months ago

Thanks for your to spend time responsing above question, you do great work!!!