open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.8k stars 1.24k forks source link

can body3d_two_stage_img_demo use videopose3d one frame model? #1396

Open darcula1993 opened 2 years ago

darcula1993 commented 2 years ago

According to the model architecture, the videopose3d_1frame model should be able to do image 2d to 3d lift inference. But when I try, I got the error

  File "demo/body3d_two_stage_img_demo.py", line 296, in <module>
    main()
  File "demo/body3d_two_stage_img_demo.py", line 243, in main
    with_track_id=False)
  File "/lixinwei/mmpose/mmpose/apis/inference_3d.py", line 332, in inference_pose_lifter_model
    data = test_pipeline(data)
  File "/lixinwei/mmpose/mmpose/datasets/pipelines/shared_transform.py", line 107, in __call__
    data = t(data)
  File "/lixinwei/mmpose/mmpose/datasets/pipelines/pose3d_transform.py", line 162, in __call__
    [0.5 * results['image_width'], 0.5 * results['image_height']],
KeyError: 'image_width'

where this key come from?

ly015 commented 2 years ago

Thanks for your feedback. Could you please provide the command that raised this error so we can locate the problem?

pallgeuer commented 2 years ago

I just came to GitHub to report similar/related problems. I tested with:

python demo/body3d_two_stage_video_demo.py demo/mmdetection_cfg/faster_rcnn_r50_fpn_coco.py https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/mpi_inf_3dhp/videopose3d_mpi-inf-3dhp_1frame_fullconv_supervised_gt.py https://download.openmmlab.com/mmpose/body3d/videopose/videopose_mpi-inf-3dhp_1frame_fullconv_supervised_gt-d6ed21ef_20210603.pth --video-path VIDEO.mp4 --rebase-keypoint-height --show

and got:

Stage 1: 2D pose detection.
Initializing model...
load checkpoint from http path: https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
load checkpoint from http path: https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth
Running 2D pose detection inference...
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ] 239/240, 3.8 task/s, elapsed: 63s, ETA:     0s
Stage 2: 2D-to-3D pose lifting.
Initializing model...
load checkpoint from http path: https://download.openmmlab.com/mmpose/body3d/videopose/videopose_mpi-inf-3dhp_1frame_fullconv_supervised_gt-d6ed21ef_20210603.pth
Traceback (most recent call last):
  File "/home/allgeuer/Programs/DeepStack/envs/mmdev/mmpose/demo/body3d_two_stage_video_demo.py", line 377, in <module>
    main()
  File "/home/allgeuer/Programs/DeepStack/envs/mmdev/mmpose/demo/body3d_two_stage_video_demo.py", line 290, in main
    res['keypoints'] = convert_keypoint_definition(
  File "/home/allgeuer/Programs/DeepStack/envs/mmdev/mmpose/demo/body3d_two_stage_video_demo.py", line 62, in convert_keypoint_definition
    raise NotImplementedError
NotImplementedError

A quick check of convert_keypoint_definition() shows that the only 3D keypoint definition that is supported is Body3DH36MDataset, which is incompatible with videopose3d. This should be fixed.

I have my own code that uses the MMPose API to bypass this with a manual conversion, and then you get the problem with the following lines of code from mmpose/apis/inference_3d.py:

        assert 'stats_info' in dataset_info._dataset_info
        bbox_center = dataset_info._dataset_info['stats_info']['bbox_center']
        bbox_scale = dataset_info._dataset_info['stats_info']['bbox_scale']

The problem is that stats_info is not defined in the configs/_base_/datasets/mpi_inf_3dhp.py file. As a quick fix I just added the following line (copied from H36M):

stats_info=dict(bbox_center=(528., 427.), bbox_scale=400.),

Aside from implementing the required keypoint definition conversion that I already pointed out above, could you guys please compute the mean bbox center and scale for the mpi_inf_3dhp dataset and add a correct stats_info line as I have exampled?

darcula1993 commented 2 years ago

this is my command:

python demo/body3d_two_stage_img_demo.py \
    configs/body/3d_kpt_sview_rgb_vid/video_pose_lift/h36m/videopose3d_h36m_1frame_fullconv_supervised_cpn_ft.py \
    /lixinwei/mmpose/work_dirs/videopose3d_h36m_1frame_fullconv_supervised_cpn_ft/best_MPJPE_epoch_150.pth \
    --json-file tests/data/h36m/h36m_coco.json \
    --img-root tests/data/h36m \
    --camera-param-file tests/data/h36m/cameras.pkl \
    --only-second-stage \
    --out-img-root vis_results \
    --rebase-keypoint-height \
    --show-ground-truth
liqikai9 commented 2 years ago

@darcula1993 I have tried your command and encountered the same problem.

Actually, the pipelines for image and single-frame in the config file are slightly different. For example, you can check these two config files:

simplebaseline3d_h36m.py

videopose3d_h36m_1frame_fullconv_supervised_cpn_ft.py

for more details.

So if you want to run image demo, you'd better use the config file from this folder: https://github.com/open-mmlab/mmpose/tree/master/configs/body/3d_kpt_sview_rgb_img/pose_lift.

pallgeuer commented 2 years ago

I think what both of us were trying to achieve is to run the videopose3d model that only uses one frame at a time as input, on video input. This is not the same as trying to run that model on a single input image. With a different config the model may run fine as part of the image demo, but that's not what we are trying to do.

I have quite explicitly documented in my previous post what it would take to resolve the errors I have seen - is there any update on that?

liqikai9 commented 2 years ago

@pallgeuer Thanks for your interest in this issue.

I think the main concern of this issue is if we can use videopose3d one-frame model to do inference on image input, but not video input. Actually the model for image demo should be SimpleBaseline3D and the model for video demo should be VideoPose3D, but they are unified implemented as PoseLifter in mmpose, which may cause some confusion.

So my advice is that you'd better run the image demo using the SimpleBaseline3D model listed in this folder: https://github.com/open-mmlab/mmpose/tree/master/configs/body/3d_kpt_sview_rgb_img/pose_lift.

A quick check of convert_keypoint_definition() shows that the only 3D keypoint definition that is supported is Body3DH36MDataset, which is incompatible with videopose3d. This should be fixed.

Here, you have pointed out that we can only run the demo script on Body3DH36MDataset and can not run on other datasets like Body3DMpiInf3dhpDataset, due to the limit of convert_keypoint_definition function.

We will check the demo script asap. You can also raise another issue for this discussion.

pallgeuer commented 2 years ago

Okay, yes, rechecking the original issue it was indeed related to image, not video, my mistake. As a goal for mmpose though, I guess there is no reason why it shouldn't be made possible to run it on either an image or video input. I haven't tried image, but I have indeed successfully run it on video, with the simple one line change I mentioned in my first post, in addition to fixing convert_keypoint_definition of course. So it seems like it could feasibly be made to work for both image and video without a great amount of change or effort. I guess that's what I was trying to say.

liqikai9 commented 2 years ago

@pallgeuer Thanks for your nice suggestion! Would you like to raise a PR to fix this problem you mentioned?