open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.84k stars 1.25k forks source link

2D Human Pose for H36M Topology #1691

Closed buckeye17 closed 1 year ago

buckeye17 commented 2 years ago

Reproduction

I'm attempting to predict 2D human pose for a given image. I'm able to successfully make these predictions when using the following settings:

device = 'cuda' if torch.cuda.is_available() else 'cpu'
pose_model = init_pose_model(
     "configs/body/2d_kpt_sview_rgb_img/deeppose/coco/res152_coco_384x288_rle.py", \
     "https://download.openmmlab.com/mmpose/top_down/deeppose/deeppose_res152_coco_384x288_rle-b77c4c37_20220624.pth", \
     device=device
)

But when I attempt to predict H36M topology by only changing the following settings, the result doesn't look anything like a skeleton. Any ideas what I am doing wrong?

device = 'cuda' if torch.cuda.is_available() else 'cpu'
pose_model = init_pose_model(
     "configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/h36m/hrnet_w48_h36m_256x256.py", \
     "https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_h36m_256x256-78e88d08_20210621.pth", \
     device=device
)

Environment

sys.platform: linux
Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2080 Ti
CUDA_HOME: None
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.12.0
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.13.0
OpenCV: 4.6.0
MMCV: 1.6.1
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.3
MMPose: 0.28.1+unknown

MMPOSE has been installed using pip.

Thanks for your help!

ly015 commented 2 years ago

This should be a performance issue. H36M dataset is collected in a constrained environment with a consistent background and limited human identities. Thus the model trained using h36m usually generalizes poorly to unconstrained settings, like COCO images or images collected by users.

buckeye17 commented 2 years ago

This should be a performance issue. H36M dataset is collected in a constrained environment with a consistent background and limited human identities. Thus the model trained using h36m usually generalizes poorly to unconstrained settings, like COCO images or images collected by users.

Thanks for the explanation! The reason I am attempting to create H36M skeletons is because it appears to be necessary for lifting 2D skeletons to 3D. Does MMPOSE provide another means for generating 2D skeletons which can be lifted to 3D?

ly015 commented 2 years ago

You can use a 2D pose model trained on COCO dataset, and manually convert its output to H36M format for pose lifting. Our pose lifting demo follows this approach. More details can be found at the demo doc and the coco-to-h36m conversion code.