Issues running body3d_two_stage_video_demo.py using videopose3d model

(New issue based on my comments on #1396 )

I am attempting to run the two stage body3d video demo using the 1-frame videopose3d model:

python demo/body3d_two_stage_video_demo.py demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp_1frame_fullconv_supervised_gt.py https://download.openmmlab.com/mmpose/body3d/videopose/videopose_mpi-inf-3dhp_1frame_fullconv_supervised_gt-d6ed21ef_20210603.pth --video-path VIDEO.mp4 --rebase-keypoint-height --show

I get:

Stage 1: 2D pose detection.
Initializing model...
load checkpoint from http path: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
load checkpoint from http path: https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth
Running 2D pose detection inference...
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 239/240, 3.8 task/s, elapsed: 63s, ETA:     0s
Stage 2: 2D-to-3D pose lifting.
Initializing model...
load checkpoint from http path: https://download.openmmlab.com/mmpose/body3d/videopose/videopose_mpi-inf-3dhp_1frame_fullconv_supervised_gt-d6ed21ef_20210603.pth
Traceback (most recent call last):
  File "/home/allgeuer/Programs/DeepStack/envs/mmdev/mmpose/demo/body3d_two_stage_video_demo.py", line 377, in <module>
    main()
  File "/home/allgeuer/Programs/DeepStack/envs/mmdev/mmpose/demo/body3d_two_stage_video_demo.py", line 290, in main
    res['keypoints'] = convert_keypoint_definition(
  File "/home/allgeuer/Programs/DeepStack/envs/mmdev/mmpose/demo/body3d_two_stage_video_demo.py", line 62, in convert_keypoint_definition
    raise NotImplementedError
NotImplementedError

A quick check of convert_keypoint_definition() shows that the only 3D keypoint definition that is supported is Body3DH36MDataset, which is incompatible with videopose3d. This should be fixed.

I have my own code that uses the MMPose API to bypass this with a manual conversion, and then you get the problem with the following lines of code from mmpose/apis/inference_3d.py:

        assert 'stats_info' in dataset_info._dataset_info
        bbox_center = dataset_info._dataset_info['stats_info']['bbox_center']
        bbox_scale = dataset_info._dataset_info['stats_info']['bbox_scale']

The problem is that stats_info is not defined in the configs/_base_/datasets/mpi_inf_3dhp.py file. As a quick fix I just added the following line (copied from H36M):

stats_info=dict(bbox_center=(528., 427.), bbox_scale=400.),

Aside from implementing the required keypoint definition conversion that I already pointed out above, could you guys please compute the mean bbox center and scale for the mpi_inf_3dhp dataset and add a correct stats_info line as I have exampled?

In terms of my manual implementation of convert_keypoint_definition(), it goes as follows (slightly different name and call convention as this was part of my own code, but it should be easily transferable):

# Convert 2D keypoint detections so that they are compatible with the definitions required for 3D keypoint lifting
def convert_2d_results_for_3d(dd_results, dd_dataset, ddd_dataset, in_place=False):
    dd_results_ddd = dd_results if in_place else copy.deepcopy(dd_results)
    dd_is_aic = dd_dataset in ('TopDownAicDataset', 'BottomUpAicDataset')
    dd_is_coco = dd_dataset in ('TopDownCocoDataset', 'BottomUpCocoDataset')
    dd_is_crowd = dd_dataset in ('TopDownCrowdPoseDataset', 'BottomUpCrowdPoseDataset')
    dd_is_h36m = dd_dataset in ('TopDownH36MDataset', 'BottomUpH36MDataset')
    ddd_is_h36m = ddd_dataset == 'Body3DH36MDataset'
    ddd_is_mpiinf = ddd_dataset == 'Body3DMpiInf3dhpDataset'
    if ddd_is_h36m:
        if dd_is_h36m:
            pass
        elif dd_is_aic:
            for dd_result in dd_results_ddd:
                keypoints = dd_result['keypoints']
                keypoints_new = np.zeros((17, keypoints.shape[1]), dtype=keypoints.dtype)
                keypoints_new[0] = (keypoints[9] + keypoints[6]) / 2              # Root (pelvis) is in the middle of l_hip and r_hip
                keypoints_new[8] = keypoints[13]                                  # Thorax (bottom end of neck) is neck
                keypoints_new[7] = (keypoints_new[0] + keypoints_new[8]) / 2      # Spine (centre of torso) is in the middle of root and thorax
                keypoints_new[9] = (3 * keypoints[13] + keypoints[12]) / 4        # Neck base (top end of neck) is 1/4 the way from neck (bottom end of neck) to head top
                keypoints_new[10] = (5 * keypoints[13] + 7 * keypoints[12]) / 12  # Head (spherical centre of head) is 7/12 the way from neck (bottom end of neck) to head top
                keypoints_new[[1, 2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16]] = keypoints[[6, 7, 8, 9, 10, 11, 3, 4, 5, 0, 1, 2]]  # Arms and legs
                dd_result['keypoints'] = keypoints_new
        elif dd_is_coco:
            for dd_result in dd_results_ddd:
                keypoints = dd_result['keypoints']
                keypoints_new = np.zeros((17, keypoints.shape[1]), dtype=keypoints.dtype)
                keypoints_new[0] = (keypoints[11] + keypoints[12]) / 2         # Root (pelvis) is in the middle of l_hip and r_hip
                keypoints_new[8] = (keypoints[5] + keypoints[6]) / 2           # Thorax (bottom end of neck) is in the middle of l_shoulder and r_shoulder
                keypoints_new[7] = (keypoints_new[0] + keypoints_new[8]) / 2   # Spine (centre of torso) is in the middle of root and thorax
                keypoints_new[10] = (keypoints[3] + keypoints[4]) / 2          # Head (spherical centre of head) is in the middle of l_ear and r_ear
                keypoints_new[9] = (keypoints_new[10] + keypoints_new[8]) / 2  # Neck base (top end of neck) is in the middle of head and thorax
                keypoints_new[[1, 2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16]] = keypoints[[12, 14, 16, 11, 13, 15, 5, 7, 9, 6, 8, 10]]  # Arms and legs
                dd_result['keypoints'] = keypoints_new
        elif dd_is_crowd:
            for dd_result in dd_results_ddd:
                keypoints = dd_result['keypoints']
                keypoints_new = np.zeros((17, keypoints.shape[1]), dtype=keypoints.dtype)
                keypoints_new[0] = (keypoints[6] + keypoints[7]) / 2              # Root (pelvis) is in the middle of l_hip and r_hip
                keypoints_new[8] = keypoints[13]                                  # Thorax (bottom end of neck) is neck
                keypoints_new[7] = (keypoints_new[0] + keypoints_new[8]) / 2      # Spine (centre of torso) is in the middle of root and thorax
                keypoints_new[9] = (3 * keypoints[13] + keypoints[12]) / 4        # Neck base (top end of neck) is 1/4 the way from neck (bottom end of neck) to top head
                keypoints_new[10] = (5 * keypoints[13] + 7 * keypoints[12]) / 12  # Head (spherical centre of head) is 7/12 the way from neck (bottom end of neck) to top head
                keypoints_new[[1, 2, 3, 4, 5, 6, 11, 12, 13, 14, 15, 16]] = keypoints[[7, 9, 11, 6, 8, 10, 0, 2, 4, 1, 3, 5]]  # Arms and legs
                dd_result['keypoints'] = keypoints_new
        else:
            raise NotImplementedError(f"Incompatible 2D dataset for 3D dataset {ddd_dataset}: {dd_dataset}")
    elif ddd_is_mpiinf:
        if dd_is_aic:
            for dd_result in dd_results_ddd:
                keypoints = dd_result['keypoints']
                keypoints_new = np.zeros((17, keypoints.shape[1]), dtype=keypoints.dtype)
                keypoints_new[0] = keypoints[12]                                  # Head top is head top
                keypoints_new[1] = keypoints[13]                                  # Neck (bottom end of neck) is neck
                keypoints_new[14] = (keypoints[9] + keypoints[6]) / 2             # Root (pelvis) is in the middle of l_hip and r_hip
                keypoints_new[15] = (keypoints_new[1] + keypoints_new[14]) / 2    # Spine (centre of torso) is in the middle of neck and root
                keypoints_new[16] = (5 * keypoints[13] + 7 * keypoints[12]) / 12  # Head (spherical centre of head) is 7/12 the way from neck to head top
                keypoints_new[2:14] = keypoints[0:12]                             # Arms and legs
                dd_result['keypoints'] = keypoints_new
        elif dd_is_coco:
            for dd_result in dd_results_ddd:
                keypoints = dd_result['keypoints']
                keypoints_new = np.zeros((17, keypoints.shape[1]), dtype=keypoints.dtype)
                keypoints_new[1] = (keypoints[5] + keypoints[6]) / 2                          # Neck (bottom end of neck) is in the middle of l_shoulder and r_shoulder
                keypoints_new[14] = (keypoints[11] + keypoints[12]) / 2                       # Root (pelvis) is in the middle of l_hip and r_hip
                keypoints_new[15] = (keypoints_new[1] + keypoints_new[14]) / 2                # Spine (centre of torso) is in the middle of neck and root
                keypoints_new[16] = (keypoints[3] + keypoints[4]) / 2                         # Head (spherical centre of head) is in the middle of l_ear and r_ear
                keypoints_new[0] = (12 * keypoints_new[16] - 5 * keypoints_new[1]) / 7        # Head top is extrapolated from neck and head
                keypoints_new[0, 2] = keypoints_new[16, 2]                                    # Don't extrapolate the head top confidence score
                keypoints_new[2:14] = keypoints[[6, 8, 10, 5, 7, 9, 12, 14, 16, 11, 13, 15]]  # Arms and legs
                dd_result['keypoints'] = keypoints_new
        else:
            raise NotImplementedError(f"Incompatible 2D dataset for 3D dataset {ddd_dataset}: {dd_dataset}")
    else:
        raise NotImplementedError(f"Unsupported 3D dataset: {ddd_dataset}")
    return dd_results_ddd

open-mmlab / mmpose

Issues running body3d_two_stage_video_demo.py using videopose3d model #1485