Jihyun Lee1, Shunsuke Saito2, Giljoo Nam2, Minhyuk Sung1, Tae-Kyun (T-K) Kim1,3
1Ā KAIST, 2 Codec Avatars Lab, Meta, 3 Imperial College London
[Project Page] [Paper] [Supplementary Video]
We present š¤InterHandGen, a novel framework that learns the generative prior of two-hand interaction. Sampling from our model yields plausible and diverse two-hand shapes in close interaction with or without an object. Our prior can be incorporated into any optimization or learning methods to reduce ambiguity in an ill-posed setup. Our key observation is that directly modeling the joint distribution of multiple instances imposes high learning complexity due to its combinatorial nature. Thus, we propose to decompose the modeling of joint distribution into the modeling of factored unconditional and conditional single instance distribution. In particular, we introduce a diffusion model that learns the single-hand distribution unconditional and conditional to another hand via conditioning dropout. For sampling, we combine anti-penetration and classifier-free guidance to enable plausible generation. Furthermore, we establish the rigorous evaluation protocol of two-hand synthesis, where our method significantly outperforms baseline generative models in terms of plausibility and diversity. We also demonstrate that our diffusion prior can boost the performance of two-hand reconstruction from monocular in-the-wild images, achieving new state-of-the-art accuracy.
[Apr 14th 2024] There was a bug in anti-penetration loss guidance weighting, and that part is now fixed. I am sorry for the inconvenience.
requirements.txt
. $ git clone https://github.com/jyunlee/InterHandGen.git
$ mv InterHandGen
$ pip install -r requirements.txt
2. Install [ChamferDistancePytorch](https://github.com/ThibaultGROUEIX/ChamferDistancePytorch).
$ cd utils
$ git clone https://github.com/ThibaultGROUEIX/ChamferDistancePytorch.git
$ mv ChamferDistancePytorch/chamfer3D
$ python setup.py install
## Data Preparation
1. Download InterHand2.6M dataset from [its official website](https://mks0601.github.io/InterHand2.6M/).
2. Follow the data pre-processing steps of [IntagHand](https://github.com/Dw1010/IntagHand) (`dataset/interhand.py`). Note that you only need the shape annotation files (`anno/*.pkl`), and you can skip the image preprocessing parts.
3. Download MANO model from [its official website](https://mano.is.tue.mpg.de/). Place the downloaded `mano_v1_2` folder under `misc` directory.
## Network Training
Train your own two-hand interaction diffusion model using the following command. Note that the pre-trained weights can be downloaded from [this Google Drive link](https://drive.google.com/drive/folders/19Hbfuy7Vg2UVLMNMHbsMKApOS07EZ0lL?usp=drive_link).
$ CUDA_VISIBLE_DEVICES={gpu_num} python interhandgen.py --train
## Network Inference
Sample two-hand interactions from the trained model. The number of samples can be controlled by `vis_epoch` (number of iterations in sampling) and `vis_batch` (number of samples for each iteration) in the config file (`configs/default.yml`). For a full evaluation, set `vis_epoch = 4` and `vis_batch = 2500` to generate 4 * 2500 = 10000 samples.
$ CUDA_VISIBLE_DEVICES={gpu_num} python interhandgen.py --model_path {trained_model_path}
## Evaluation
Compute the evaluation metrics using the sampled two-hand interactions.
$ cd eval
$ CUDA_VISIBLE_DEVICES={gpu_num} python evaluate.py --sample_num {number_of_samples} --doc {trained_model_dir}
## Citation
If you find this work useful, please consider citing our paper.
```
@inproceedings{lee2024interhandgen,
title = {InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion},
author = {Lee, Jihyun and Saito, Shunsuke and Nam, Giljoo and Sung, Minhyuk and Kim, Tae-Kyun},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}
```
## Acknowledgements
- Parts of our code are based on [DiffPose](https://github.com/GONGJIA0208/Diffpose) (forward/reverse diffusion process), [Pointnet_Pointnet2_pytorch](https://github.com/yanx27/Pointnet_Pointnet2_pytorch) (feature extraction network for evaluation), [MoCapDeform ](https://github.com/Malefikus/MoCapDeform) (anti-penetration guidance), and [motion-diffusion-model](https://github.com/GuyTevet/motion-diffusion-model) (evaluation metrics). We appreciate the authors for releasing their codes.