Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation (Hand4Whole codes)

High-resolution video link: here

Introduction

This repo is official PyTorch implementation of [Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation (CVPRW 2022 Oral.)](https://arxiv.org/abs/2011.11534). This repo contains whole-body codes. For the body-only, hand-only, and face-only codes, visit here.

Quick demo

Slightly change torchgeometry kernel code following here.
Download the pre-trained Hand4Whole from here.
Prepare input.png and pre-trained snapshot at demo folder.
Prepare human_model_files folder following below Directory part and place it at common/utils/human_model_files.
Go to any of demo folders and edit bbox.
Run python demo.py --gpu 0.

If you run this code in ssh environment without display device, do follow:

1、Install oemesa follow https://pyrender.readthedocs.io/en/latest/install/
2、Reinstall the specific pyopengl fork: https://github.com/mmatl/pyopengl
3、Set opengl's backend to egl or osmesa via os.environ["PYOPENGL_PLATFORM"] = "egl"

Directory

Root

The ${ROOT} is described as below.

${ROOT}  
|-- data  
|-- demo
|-- main  
|-- tool
|-- output  
|-- common
|   |-- utils
|   |   |-- human_model_files
|   |   |   |-- smpl
|   |   |   |   |-- SMPL_NEUTRAL.pkl
|   |   |   |-- smplx
|   |   |   |   |-- MANO_SMPLX_vertex_ids.pkl
|   |   |   |   |-- SMPL-X__FLAME_vertex_ids.npy
|   |   |   |   |-- SMPLX_NEUTRAL.pkl
|   |   |   |   |-- SMPLX_to_J14.pkl
|   |   |   |-- mano
|   |   |   |   |-- MANO_LEFT.pkl
|   |   |   |   |-- MANO_RIGHT.pkl
|   |   |   |-- flame
|   |   |   |   |-- flame_dynamic_embedding.npy
|   |   |   |   |-- flame_static_embedding.pkl
|   |   |   |   |-- FLAME_NEUTRAL.pkl

data contains data loading codes and soft links to images and annotations directories.
demo contains demo codes.
main contains high-level codes for training or testing the network.
tool contains pre-processing codes of AGORA and pytorch model editing codes.
output contains log, trained models, visualized outputs, and test result.
common contains kernel codes for Hand4Whole.
human_model_files contains smpl, smplx, mano, and flame 3D model files. Download the files from [smpl] [smplx] [SMPLX_to_J14.pkl] [mano] [flame].

Data

You need to follow directory structure of the data as below.

${ROOT}  
|-- data  
|   |-- AGORA
|   |   |-- data
|   |   |   |-- AGORA_train.json
|   |   |   |-- AGORA_validation.json
|   |   |   |-- AGORA_test_bbox.json
|   |   |   |-- images_1280x720
|   |   |   |-- images_3840x2160
|   |   |   |-- smplx_params_cam
|   |   |   |-- cam_params
|   |-- EHF
|   |   |-- data
|   |   |   |-- EHF.json
|   |-- Human36M  
|   |   |-- images  
|   |   |-- annotations  
|   |-- MPII
|   |   |-- data
|   |   |   |-- images
|   |   |   |-- annotations
|   |-- MPI_INF_3DHP
|   |   |-- data
|   |   |   |-- images_1k
|   |   |   |-- MPI-INF-3DHP_1k.json
|   |   |   |-- MPI-INF-3DHP_camera_1k.json
|   |   |   |-- MPI-INF-3DHP_joint_3d.json
|   |   |   |-- MPI-INF-3DHP_SMPL_NeuralAnnot.json
|   |-- MSCOCO  
|   |   |-- images  
|   |   |   |-- train2017  
|   |   |   |-- val2017  
|   |   |-- annotations 
|   |-- PW3D
|   |   |-- data
|   |   |   |-- 3DPW_train.json
|   |   |   |-- 3DPW_validation.json
|   |   |   |-- 3DPW_test.json
|   |   |-- imageFiles

Refer to AGORA [parsing codes]
Download EHF parsed data [data]
Download Human3.6M parsed data and SMPL-X parameters [data][SMPL-X parameters from NeuralAnnot]
Download MPII parsed data and SMPL-X parameters [data][SMPL-X parameters from NeuralAnnot]
Download MPI-INF-3DHP parsed data and SMPL-X parameters [data][SMPL-X parameters from NeuralAnnot]
Download MSCOCO data and SMPL-X parameters [data][SMPL-X parameters from NeuralAnnot]
Download 3DPW parsed data [data]
All annotation files follow MSCOCO format. If you want to add your own dataset, you have to convert it to MSCOCO format.

Output

You need to follow the directory structure of the output folder as below.

${ROOT}  
|-- output  
|   |-- log  
|   |-- model_dump  
|   |-- result  
|   |-- vis

Creating output folder as soft link form is recommended instead of folder form because it would take large storage capacity.
log folder contains training log file.
model_dump folder contains saved checkpoints for each epoch.
result folder contains final estimation files generated in the testing stage.
vis folder contains visualized results.

Running Hand4Whole

In the main/config.py, you can change datasets to use.

Train

The training consists of three stages.

1st: pre-train Hand4Whole

In the main folder, run

python train.py --gpu 0-3 --lr 1e-4 --continue

to train Hand4Whole on the GPU 0,1,2,3. --gpu 0,1,2,3 can be used instead of --gpu 0-3. To train Hand4Whole from the pre-trained 2D human pose estimation network, download this and place it at tool. Then, run python convert_simple_to_pose2pose.py, which produces snapshot_0.pth.tar. Finally, place snapshot_0.pth.tar to output/model_dump.

2nd: pre-train hand-only Pose2Pose

Download pre-trained hand-only Pose2Pose from here. Place the hand-only Pose2Pose to tool/snapshot_12_hand.pth.tar. Also, place the pre-trained Hand4Whole of the first stage to tool/snapshot_6_all.pth.tar. Then, go to tool folder and run python merge_hand_to_all.py. Place the generated snapshot_0.pth.tar to output/model_dump.

Or, you can pre-train hand-only Pose2Pose by yourself. Switch to Pose2Pose branch and train hand-only Pose2Pose on MSCOCO, FreiHAND, InterHand2.6M.

3rd: combine pre-trained Hand4Whole and hand-only Pose2Pose and fine-tune it

Move snapshot_6.pth.tar of the 1st stage to tool/snapshot_6_all.pth.tar. Then, move snapshot_12.pth.tar of the 2nd stage to tool/snapshot_12_hand.pth.tar. Run python merge_hand_to_all.py at the tool folder. Move generated snapshot_0.pth.tar to output/model_dump. In the main folder, run

python train.py --gpu 0-3 --lr 1e-5 --continue

to train Hand4Whole on the GPU 0,1,2,3. --gpu 0,1,2,3 can be used instead of --gpu 0-3.

Test

Place trained model at the output/model_dump/.

In the main folder, run

python test.py --gpu 0-3 --test_epoch 6

to test Hand4Whole on the GPU 0,1,2,3 with60th epoch trained model. --gpu 0,1,2,3 can be used instead of --gpu 0-3.

Models

Download Hand4Whole trained on H36M+MPII+MSCOCO from here.
Download Hand4Whole fine-tuned on AGORA (without gender classification) from here.
To fine-tine Hand4Whole on AGORA, move snapshot_6.pth.tar, generated after the 3rd training stage, to tool and run python reset_epoch.py. Then, move the generated snapshot_0.pth.tar to output/model_dump and run python train.py --gpu 0-3 --lr 1e-4 after changing trainset_3d=['AGORA'], trainset_2d[], testset='AGORA, lr_dec_epoch=[40,60], and end_epoch = 70 at config.py.

Results

3D whole-body results

3D body-only and hand-only results

For the 3D body-only and hand-only codes, visit here.

Troubleshoots

RuntimeError: Subtraction, the '-' operator, with a bool tensor is not supported. If you are trying to invert a mask, use the '~' or 'logical_not()' operator instead.: Go to here

Reference

@InProceedings{Moon_2022_CVPRW_Hand4Whole,  
author = {Moon, Gyeongsik and Choi, Hongsuk and Lee, Kyoung Mu},  
title = {Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation},  
booktitle = {Computer Vision and Pattern Recognition Workshop (CVPRW)},  
year = {2022}  
}

mks0601 / Hand4Whole_RELEASE

readme