shreyashampali / HOnnotate

CVPR2020. HOnnotate: A method for 3D Annotation of Hand and Object Poses
https://www.tugraz.at/index.php?id=40231
GNU General Public License v3.0
171 stars 16 forks source link

HOnnotate: A method for 3D Annotation of Hand and Object Poses

Shreyas Hampali, Mahdi Rad, Markus Oberweger, Vincent Lepetit, CVPR 2020

This repository contains code for annotating 3D poses of hand and object when captured with a single RGBD camera setup.

Citation

If this code base was helpful in your research work, please consider citing us:

@INPROCEEDINGS{hampali2020honnotate,
title={HOnnotate: A method for 3D Annotation of Hand and Object Poses},
author={Shreyas Hampali and Mahdi Rad and Markus Oberweger and Vincent Lepetit},
booktitle = {CVPR},
year = {2020}
}

Installation

Setup

HOnnotate_ROOT is the directory where you download this repo.

Data Capture

Run (Single camera setup)

Please refer to Section 4.2 in the paper. Below stages for performing automatic hand-object pose annotation follows the same pipeline as in paper. Single camera pipeline

0. Keypoints and Segmentations

We use deeplab network for segmentation and Convolutional pose machine for hand keypoint detection. The networks are trained with HO3D dataset and the weights can be downloaded from here

0.1. Hand+Object segmentations

        python inference_seg.py --seq 'test'

The segmentations are saved in segmentation directory of the test sequence

0.2. Hand 2D keypoints

This requires the segmentation script to be run beforehand

        python inference_hand.py --seq 'test'

The 2D keypoints are saved in CPMHand directory of the test sequence

1. Hand and Object Pose Initializations

1.1. Object pose initialization

The object pose in all frames of the sequence is initialized by tracking. To reduce the effort of manual initialization, the object pose in the first frame can be simple upright position. Before tracking the object pose, the config file configObjPose.json in configs folder of thetest sequence should be updated.

The following script starts object tracking from the first frame of the sequence.

        cd ./optimization
        python objectTrackingSingleFrame.py --seq 'test' --doPyRender

Remove --doPyRender flag to run the script faster. It only helps with the visualization. The script creates dirt_obj_pose folder in the test sequence folder with the results of optimization for each frame and below visualization.

The above figure shows the input frame after object segmentation, object rendered in the initialization pose, depth map error and silhouette error.

1.2. Hand pose initialization

This script obtains initial 3D grasp pose of the hand relative to the object coordinate frame using the hand 2D keypoints detected earlier (step 0.2). Refer to Eq. (12) in paper for this optimization for more details.

        python handPoseMultiframeInit.py --seq 'test'

The optimization uses chumpy package and is hence slow. The results are stored in handInit folder of test sequence.

The 2D keypoints are lifted to 3D keypoints and the resulting mesh is shown in the above figure.

2. Grasp Pose Estimation

A more accurate grasp pose of the hand is obtained using the initialization in step 1.2. Refer to Eq. (13) in paper for more details. Modify the config file configHandPose.json in configs folder of test sequence as in step 1.1. Update betaFileName field in the json file to use different hand shape parameters or point to the correct beta files. Beta parameters of 10 different subjects used in the dataset generation can be downloaded from here

        python handPoseMultiframe.py --seq 'test' --numIter 200 --showFig --doPyRender

Remove --showFig and --doPyRender flags to run faster without visualization. The results of optimization and visualization (if enabled) will be dumped in dirt_grasp_pose folder of the test sequence.

The first figure above shows the pose of object and hand during optimization. First row is the input image, second row is the hand-object rendered with poses at current iteration, third and fourth row shows the depth and silhoutte erro. The second figure above is the grasp pose of the hand after optimization.

3. Object Pose Estimation

A more accurate object pose is obtained by tracking the object poses as explained in Section 4.2 of paper. The difference between this stage and Object pose initialization in step 1.1 is, the hand mesh rendered with the estimated grasp pose is also used in the optimization. Update the configHandObjPose.json file in configs folder of the test sequence as earlier in step 1.1.

        python handObjectTrackingSingleFrame.py --seq 'test' --showFig --doPyRender

The results are dumped in dirt_hand_obj_pose folder of test sequence.

4. Multi-frame Pose Refinement

This stage performs optimization over mutliple frames and over all the hand-object pose variables. Refer to Eq. (1) in paper. The optimization is done in batches.

        python handObjectRefinementMultiframe.py --seq 'test' --showFig --doPyRender --batchSize 20

The results are dumped in dirt_hand_obj_refine folder of test sequence.

Known issues

The segmentation network often under-segmentents the hand near the finger tips. This results in a small shift in the final annotated keypoints of the finger tips. In order to account for this, the segmentation maps are corrected after Step 3 using the estimated keypoints and depth map. The segmentation correction script will be updated soon.

Acknowledgements