High-resolution video link: here
This repo is official PyTorch implementation of [Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation (CVPRW 2022 Oral.)](https://arxiv.org/abs/2011.11534). This repo contains whole-body codes. For the body-only, hand-only, and face-only codes, visit here.
torchgeometry
kernel code following here.input.png
and pre-trained snapshot at demo
folder.human_model_files
folder following below Directory
part and place it at common/utils/human_model_files
.demo
folders and edit bbox
.python demo.py --gpu 0
.1、Install oemesa follow https://pyrender.readthedocs.io/en/latest/install/
2、Reinstall the specific pyopengl fork: https://github.com/mmatl/pyopengl
3、Set opengl's backend to egl or osmesa via os.environ["PYOPENGL_PLATFORM"] = "egl"
The ${ROOT}
is described as below.
${ROOT}
|-- data
|-- demo
|-- main
|-- tool
|-- output
|-- common
| |-- utils
| | |-- human_model_files
| | | |-- smpl
| | | | |-- SMPL_NEUTRAL.pkl
| | | |-- smplx
| | | | |-- MANO_SMPLX_vertex_ids.pkl
| | | | |-- SMPL-X__FLAME_vertex_ids.npy
| | | | |-- SMPLX_NEUTRAL.pkl
| | | | |-- SMPLX_to_J14.pkl
| | | |-- mano
| | | | |-- MANO_LEFT.pkl
| | | | |-- MANO_RIGHT.pkl
| | | |-- flame
| | | | |-- flame_dynamic_embedding.npy
| | | | |-- flame_static_embedding.pkl
| | | | |-- FLAME_NEUTRAL.pkl
data
contains data loading codes and soft links to images and annotations directories. demo
contains demo codes.main
contains high-level codes for training or testing the network. tool
contains pre-processing codes of AGORA and pytorch model editing codes.output
contains log, trained models, visualized outputs, and test result. common
contains kernel codes for Hand4Whole. human_model_files
contains smpl
, smplx
, mano
, and flame
3D model files. Download the files from [smpl] [smplx] [SMPLX_to_J14.pkl] [mano] [flame].You need to follow directory structure of the data
as below.
${ROOT}
|-- data
| |-- AGORA
| | |-- data
| | | |-- AGORA_train.json
| | | |-- AGORA_validation.json
| | | |-- AGORA_test_bbox.json
| | | |-- images_1280x720
| | | |-- images_3840x2160
| | | |-- smplx_params_cam
| | | |-- cam_params
| |-- EHF
| | |-- data
| | | |-- EHF.json
| |-- Human36M
| | |-- images
| | |-- annotations
| |-- MPII
| | |-- data
| | | |-- images
| | | |-- annotations
| |-- MPI_INF_3DHP
| | |-- data
| | | |-- images_1k
| | | |-- MPI-INF-3DHP_1k.json
| | | |-- MPI-INF-3DHP_camera_1k.json
| | | |-- MPI-INF-3DHP_joint_3d.json
| | | |-- MPI-INF-3DHP_SMPL_NeuralAnnot.json
| |-- MSCOCO
| | |-- images
| | | |-- train2017
| | | |-- val2017
| | |-- annotations
| |-- PW3D
| | |-- data
| | | |-- 3DPW_train.json
| | | |-- 3DPW_validation.json
| | | |-- 3DPW_test.json
| | |-- imageFiles
You need to follow the directory structure of the output
folder as below.
${ROOT}
|-- output
| |-- log
| |-- model_dump
| |-- result
| |-- vis
output
folder as soft link form is recommended instead of folder form because it would take large storage capacity. log
folder contains training log file. model_dump
folder contains saved checkpoints for each epoch. result
folder contains final estimation files generated in the testing stage. vis
folder contains visualized results. main/config.py
, you can change datasets to use. The training consists of three stages.
In the main
folder, run
python train.py --gpu 0-3 --lr 1e-4 --continue
to train Hand4Whole on the GPU 0,1,2,3. --gpu 0,1,2,3
can be used instead of --gpu 0-3
. To train Hand4Whole from the pre-trained 2D human pose estimation network, download this and place it at tool
. Then, run python convert_simple_to_pose2pose.py
, which produces snapshot_0.pth.tar
. Finally, place snapshot_0.pth.tar
to output/model_dump
.
Download pre-trained hand-only Pose2Pose from here.
Place the hand-only Pose2Pose to tool/snapshot_12_hand.pth.tar
.
Also, place the pre-trained Hand4Whole of the first stage to tool/snapshot_6_all.pth.tar
.
Then, go to tool
folder and run python merge_hand_to_all.py
.
Place the generated snapshot_0.pth.tar
to output/model_dump
.
Or, you can pre-train hand-only Pose2Pose by yourself. Switch to Pose2Pose branch and train hand-only Pose2Pose on MSCOCO, FreiHAND, InterHand2.6M.
Move snapshot_6.pth.tar
of the 1st stage to tool/snapshot_6_all.pth.tar
.
Then, move snapshot_12.pth.tar
of the 2nd stage to tool/snapshot_12_hand.pth.tar
.
Run python merge_hand_to_all.py
at the tool
folder.
Move generated snapshot_0.pth.tar
to output/model_dump
.
In the main
folder, run
python train.py --gpu 0-3 --lr 1e-5 --continue
to train Hand4Whole on the GPU 0,1,2,3. --gpu 0,1,2,3
can be used instead of --gpu 0-3
.
Place trained model at the output/model_dump/
.
In the main
folder, run
python test.py --gpu 0-3 --test_epoch 6
to test Hand4Whole on the GPU 0,1,2,3 with60th epoch trained model. --gpu 0,1,2,3
can be used instead of --gpu 0-3
.
snapshot_6.pth.tar
, generated after the 3rd training stage, to tool
and run python reset_epoch.py
. Then, move the generated snapshot_0.pth.tar
to output/model_dump
and run python train.py --gpu 0-3 --lr 1e-4
after changing trainset_3d=['AGORA']
, trainset_2d[]
, testset='AGORA
, lr_dec_epoch=[40,60]
, and end_epoch = 70
at config.py
.
For the 3D body-only and hand-only codes, visit here.
RuntimeError: Subtraction, the '-' operator, with a bool tensor is not supported. If you are trying to invert a mask, use the '~' or 'logical_not()' operator instead.
: Go to here@InProceedings{Moon_2022_CVPRW_Hand4Whole,
author = {Moon, Gyeongsik and Choi, Hongsuk and Lee, Kyoung Mu},
title = {Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation},
booktitle = {Computer Vision and Pattern Recognition Workshop (CVPRW)},
year = {2022}
}