Open letian-zhang opened 3 years ago
Hi, Since we set a fixed maximum number of persons in a scene in the network, pose_vis is needed to inform how many 2D poses are indeed there (and how many are just padded zeros). E.g, when pose_vis is [1, 1, 1, 0, 0] and the maximum number of persons is 5 in this case, then only the first 3 poses are valid and used. Joint_vis are simply the scores of keypoint detections. Since we use top-down 2D detectors, cases that 2D points are not detected won’t happen.
Hi, when I implement mvmppe into my project. I don't understand two inputs (pose_vis, joint_vis) in the network model (line 101 in mvmppe.py). Can you give some explanations about these two inputs? Also, sometimes some 2D pose points may be not detected by the 2D estimator, how to deal with this situation? Hope for your help.
def forward(self, kpts, pose_vis, joint_vis, gt_pose_depths, gt_joint_depths, meta):