mks0601 / 3DMPPE_POSENET_RELEASE

Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019
MIT License
823 stars 147 forks source link

[ Visualization: MuCo/MuPoTS ] #92

Open IemProg opened 3 years ago

IemProg commented 3 years ago

Hi,

Wonderful work, and thanks for sharing the code.

I would like to know If you have provided any code to visualize the MuCo dataset and if I need to transform the coordinates to visualize them like in demo.py (is the input provided from MuCo or MuPoTS dataset)?

` # Image taken from MuCo Dataset

for i, n in enumerate(centers_id):

    n = int(n[0])
    img_path = dataset[n]["img_path"]
    original_img = cv2.imread(img_path)
    original_img_height, original_img_width = original_img.shape[:2]

    # normalized camera intrinsics
    focal = [1500, 1500] # x-axis, y-axis
    princpt = [original_img_width/2, original_img_height/2]
    img, img2bb_trans = generate_patch_image(original_img, np.array(bbox_imgs[i]), False, 1.0, 0.0, False)
    imgs_cropped.append(img)

    # inverse affine transform (restore the crop and resize)
    pose_3d = dataset[n]["joint_img"]
    pose_3d[:, 0] = pose_3d[:,0] / cfg.output_shape[1] * cfg.input_shape[1]
    pose_3d[:, 1] = pose_3d[:,1] / cfg.output_shape[0] * cfg.input_shape[0]
    pose_3d_xy1 = np.concatenate((pose_3d[:, :2], np.ones_like(pose_3d[:, :1])), 1)
    img2bb_trans_001 = np.concatenate((img2bb_trans, np.array([0, 0, 1]).reshape(1, 3)))
    pose_3d[:, :2] = np.dot(np.linalg.inv(img2bb_trans_001), pose_3d_xy1.transpose(1, 0)).transpose(1,0)[:, :2]
    pose_2d = pose_3d[:, :2].copy()
    list2d.append(pose_2d)

    # root-relative discretized depth -> absolute continuous depth
    pose_3d[:,2] = (pose_3d[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + root_depth_list[0]
    pose_3d = pixel2cam(pose_3d, focal, princpt)
    list3d.append(pose_3d.copy())

    # Visualize 2d poses
    vis_img = original_img.copy()
    for i in range(len(list2d)):
        vis_kps = np.zeros((3, joint_num))
        vis_kps[0, :] = list2d[i][:, 0]
        vis_kps[1, :] = list2d[i][:, 1]
        vis_kps[2, :] = 1
        vis_img = vis_keypoints(vis_img, vis_kps, skeleton)
        name = folder + "/Center_{}_ID_{}.png".format(str(i), str(n))
        cv2.imwrite(name, vis_img)

`

Thanks !

mks0601 commented 3 years ago

A1. MuCo dataset has its focal lengths. The focal lengths in here are just a normalized focal lengths. I used the normalized ones as images in the wild (e.g., images of MSCOCO datasets) do not provide focal lengths. You should set focal lengths of MuCo dataset for the projection and visualization.

A2. I'm not sure where the root_cam is from, but I guess root_cam is a x-, y-, and z-axis coordinate of human root joint in camera-centered coordinate system. The root_depth_list contains root joint depth of all persons in the input image. Therefore, root_cam[2] is identical to one of root_depth_list.

IemProg commented 3 years ago

This is the code I'm using to generate visualizations for the MuCo dataset, I took one person from an image, but OpenCV does not generate any keypoints on the original image and the 3D visualization is really messy (I'm using ground-truth coordinates from JSON file). I couldn't spot where is the error.

I will be very grateful for your help! Thanks!

`

    n = int(n[0])
    root_depth_list = dataset[n]["root_cam"]
    img_path = dataset[n]["img_path"]
    original_img = cv2.imread(img_path)
    original_img_height, original_img_width = original_img.shape[:2]
    # Normalized camera intrinsics
    focal = dataset[n]["f"]                                                 # [1500, 1500] # x-axis, y-axis
    c = dataset[n]["c"]
    princpt = [original_img_width/2, original_img_height/2]                 # x-axis, y-axis
    bbox = process_bbox(np.array(bbox_imgs[i]), original_img_width, original_img_height)
    img, img2bb_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, False)
    imgs_cropped.append(img)

    # Inverse affine transform (restore the crop and resize)
    pose_3d = dataset[n]["joint_img"]
    # restore coordinates to original space
    pred_2d_kpt = pose_3d.copy()
    # only consider eval_joint
    pred_2d_kpt = np.take(pred_2d_kpt, np.arange(21), axis=0)
    pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
    pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
    pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + root_depth_list[2]

    cvimg = cv2.imread(img_path, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
    tmpimg = cvimg.copy().astype(np.uint8)
    tmpkps = np.zeros((3,joint_num))
    tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
    tmpkps[2,:] = 1
    tmpimg = vis_keypoints(tmpimg, tmpkps, skeleton, kp_thresh=0)
    name = folder + "/Center_{}_ID_{}.png".format(str(i), str(n))
    cv2.imwrite(name, tmpimg)

    pred_3d_kpt = pixel2cam(pred_2d_kpt, focal, c)
    name = folder + "/3D_Center_{}_ID_{}.png".format(str(i), str(n))
    output_pose_3d_list = [pred_3d_kpt]
    vis_kps = np.array(output_pose_3d_list)
    tmpkps = np.expand_dims(pose_3d, axis = 0)
    vis_3d_multiple_skeleton(tmpkps, np.ones_like(vis_kps), skeleton, filename = name)`
mks0601 commented 3 years ago
 # Inverse affine transform (restore the crop and resize)
    pose_3d = dataset[n]["joint_img"]
    # restore coordinates to original space
    pred_2d_kpt = pose_3d.copy()
    # only consider eval_joint
    pred_2d_kpt = np.take(pred_2d_kpt, np.arange(21), axis=0)
    pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
    pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
    pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + root_depth_list[2]

Why are you converting joint_img to the original image space? I guess joint_img is already in original image space?

IemProg commented 3 years ago

Thanks! Indeed, it was a mistake. I don't have to transform it.

But for 3D visualization is still not working properly. Do I need to transform coordinates of dataset[n]["joint_cam"] in some way because visualizing them directly didn't work?

mks0601 commented 3 years ago

z-axis of joint_img might contains root-relative depth. convert it to absolute depth.

IemProg commented 3 years ago

Thanks for your feedback!

I have tried that and it did not work, here below my used code.

`

    # Convert from relative to absolute depth
    pose_3d = dataset[n]["joint_img"]

    #pose_3d[:,0] = pose_3d[:,0] / cfg.output_shape[1] * np.array(bbox_imgs[i])[2] + np.array(bbox_imgs[i])[0]
    #pose_3d[:,1] = pose_3d[:,1] / cfg.output_shape[0] * np.array(bbox_imgs[i])[3] + np.array(bbox_imgs[i])[1]

    pose_3d[:,2] = (pose_3d[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + root_depth_list[2]
    pose_3d = pixel2cam(pose_3d, focal, c)
    output_pose_3d_list = [pose_3d]
    vis_kps = np.array(output_pose_3d_list)
    name = folder + "/3D_Center_{}_ID_{}.png".format(str(i), str(n))
    vis_3d_multiple_skeleton(vis_kps, np.ones_like(vis_kps), skeleton, filename = name)`
mks0601 commented 3 years ago

Try this: z = dataset[n]['joint_img'] + root_depth_list[2]

IemProg commented 3 years ago

If you mean adding depth to Z-axis! Yeah, I tried it as well.

Also, visualizing "joint_cam" which is the coordinates with respect to the camera directly, but both do not work properly for 3D visualization.

Even taking the "joint_img" and applying pixel2cam() (using focal, and C parameters of the ground-truth ) as mentioned in demo.py to convert it from relative to absolute depth does not give the same values as "_joingcam".

I don't understand why it does not work for MuCo dataset, like MuPoTS.

Thanks!

mks0601 commented 3 years ago

No, I meant you don't have to do this: (pose_3d[:,2] / cfg.depth_dim 2 - 1) (cfg.bbox_3d_shape[0]/2). This directly add root_depth_list[2] to joint_img[:,2]

IemProg commented 3 years ago

Yes, I tried that. It did not work!

Please, if someone has successfully visualized MuCo poses, share the code. with us

Thanks !

ZXin0305 commented 2 years ago

MuPoTS dataset

hello, do you know the camera parameters of the MuPoTS dataset ? Thanks

mks0601 commented 2 years ago

The dataset provide GT 3D/2D joint coordinates. I iteratively fit 3D coordinates to 2D coordinates to get camera parameters.