This repository includes PyTorch implementation of JIFF: Jointly-aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction, tested with PyTorch 1.10.0 on Ubuntu 18.04, CUDA 10.2.
Yukang Cao, Guanying Chen, Kai Han, Wenqi Yang, Kwan-Yee K. Wong
Work-done:
THUman2.0
data with pyrender
and pyexr
THUman2.0
rendered imagesTHUman2.0
data renderingOur method further improve the face geometry and texture. It also successfully reconstructs the ears.
The environment (in conda yaml
) for JIFF
training and testing.
conda env create -f environment_JIFF.yaml
conda activate JIFF
You may also need to install torch_scatter
manually for the 3D encoder
pip install torch==1.10.0+cu102 torchvision==0.11.0+cu102 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter==2.0.9 -f https://pytorch-geometric.com/whl/torch-1.10.0+cu102.html
If you are using RTX 3090 or higher, please install torch
, torchvision
, and torch-scatter
correspondingly, i.e.,
pip install torch==1.10.0+cu111 torchvision==0.11.1+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter==2.0.9 -f https://pytorch-geometric.com/whl/torch-1.10.0+cu111.html
Follow pytorch3d and kaolin to install the required package.
We use THUman2.0
dataset for training and testing. You can download it from this link after
following the instruction and sending the request form.
Before moving to the rendering part, we find it better to rescale the meshes to be in the similar scale as in RenderPeople
and translate the meshes to be at the center of its mass. You could add -o to specify the output path, otherwise the input meshes will be overwritten. Note that although not a must, it's better to rotate the mesh to the frontal view.
python -m apps.rescale_THU -i {path_to_THUman2.0_dataset}
Generating the precomputed radiance transfer (PRT) for each obj mesh as in instructed in PIFu
:
python -m apps.prt_util -i {path_to_THUman2.0_dataset}
Rendering the meshes into 360-degree images. Together with RENDER
folder for images, it will generate the MASK
, PARAM
, UV_RENDER
, UV_MASK
, UV_NORMAL
, UV_POS
, and copy the meshes to GEO/OBJ
. Remember to add -e to enable EGL rendering.
python -m apps.render_data -i {path_to_THUman2.0_dataset} -o {path_to_THUman2.0_processed_for_training} -e
Please follow Deep3DFaceReconstruction and PyTorch version first. The instructions will lead you to "01_MorphableModel.mat", "BFM_model_front.mat", "Exp_Pca.bin", and "params.pt". Put the first four into the ./Face_Preprocess/BFM subfolder, and the last one to ./Face_Preprocess/tdmm_utils/models subfolder.
Thanks for AnimeHead, you could download their weights for head detection from here and put it into ./Face_Preprocess/detect_utils
After downloading and putting the models in designated place, the Folder would be like:
Face_Preprocess/
├── BFM
│ ├── BFM_model_front.mat
│ ├── Exp_Pca.bin
│ ├── 01_MorphableModel.mat
├── detect_utils
│ ├── best_model.pt
├── tdmm_utils
│ ├── models
│ │ ├── params.pt
│ ....
Detecting the face region in images and generate the 3DMM mesh based on Deep3DFaceReconstruction. This processing would generate FACE_3DMM
, FACE_REGION
, FILE_NUMBER
python ./Face_Preprocess/Process_face_seperately.py -i {path_to_THUman2.0_processed_for_training}/RENDER -o {path_to_THUman2.0_processed_for_training}
Due to the unrobustness of face / head estimation under extremely large pose situation, we suggest to rotate the front-view face / head mesh to certain angle for better training.
Applying Iterative Closet Point (ICP) to better align ground-truth mesh and 3DMM mesh
python ./Face_Preprocess/icp_align_gt.py -i {path_to_THUman2.0_processed_for_training}
The results will overwrite the previously estimated 3DMM meshes.
After the whole processing, the Dataset Directory would be like:
THU2_processed/
├── FILE_NUMBER
├── FACE_REGION
├── FACE_3DMM
├── UV_RENDER
├── UV_POS
├── UV_NORMAL
├── UV_MASK
├── PARAM
├── MASK
├── GEO/OBJ
├── val.txt
python -m apps.train_jiff_shape --dataroot {path_to_THUman2.0_processed_for_training} --random_flip --random_scale --num_stack 4 --num_hourglass 2 --resolution 512 --hg_down 'ave_pool' --norm 'group' --val_train_error --val_test_error --gpu_ids=0,1,2 --batch_size 3 --learning_rate 0.0001 --norm_color 'group' --sigma 3.5 --checkpoints_path {your_checkpoints_folder} --results_path {your_results_folder} --num_threads 10 --schedule 4
Coarse Pipeline
python -m apps.train_color_coarse --dataroot {path_to_THUman2.0_processed_for_training} --random_flip --random_scale --num_stack 4 --num_hourglass 2 --resolution 512 --hg_down 'ave_pool' --norm 'group' --val_train_error --val_test_error --gpu_ids=0,1,2 --batch_size 3 --learning_rate 0.0001 --norm_color 'group' --sigma 0.1 --checkpoints_path '{your_checkpoints_folder} --results_path {your_results_folder} --num_threads 10 --load_netG_checkpoint_path {path_to_your_netG_checkpoint} --num_sample_inout 0 --num_sample_color 10000
Fine Pipeline (The training would take around 30 hours per epoch)
python -m apps.train_jiff_color_fine --dataroot {path_to_THUman2.0_processed_for_training} --random_flip --random_scale --num_stack 4 --num_hourglass 2 --resolution 512 --hg_down 'ave_pool' --norm 'group' --val_train_error --val_test_error --gpu_ids=0,1,2 --batch_size 3 --learning_rate 0.0001 --norm_color 'group' --sigma 0.1 --checkpoints_path '{your_checkpoints_folder} --results_path {your_results_folder} --num_threads 10 --load_netG_checkpoint_path {path_to_your_netG_checkpoint} --load_netC_coarse_checkpoint_path {path_to_your_netC_coarse_checkpoint} --num_sample_inout 0 --num_sample_color 10000
We provide reconstructed results on THuman2.0 testdataset and renderpeople freemodel for your reference.
For testing, you would need to provide the mask image together with the rendered image.
python ./apps/eval_pifu.py --batch_size 1 --num_stack 4 --num_hourglass 2 --resolution 512 --hg_down 'ave_pool' --norm 'group' --norm_color 'group' --test_folder_path {path_to_your_testdata} --load_netG_checkpoint_path {path_to_your_netG_checkpoint} --name 'coarse_recontruction' --load_netC_coarse_checkpoint_path {path_to_your_netC_coarse_checkpoint}
Note that this will give you the PIFu reconstruction with its own texture. The sampling stragety stays the same as JIFF. Under this circumstance, you could simply compare PIFu (with more face sampling) with JIFF.
python ./Face_Preprocess/gen_3dmm.py -i {test_image_folder}
python ./Face_Preprocess/align_3dmm.py -pifu {path_to_pifu_reconstruction} -test {path_to_test_folder}
REMINDER FOR THE ALIGNMENT BETWEEN ROUGH RECONSTRUCTION ADN 3DMM:
Check the 3dmm mesh and PIFu coarse reconstruction after the icp alignment to see if the rough shape is well aligned, as the face reconstruction will depend on 3dmm mesh. The possible shift along the x, y, z axis will affect the quality of both geometry and texture.
python3 ./apps/eval_jiff.py --batch_size 1 --num_stack 4 --num_hourglass 2 --resolution 512 --hg_down 'ave_pool' --norm 'group' --norm_color 'group' --test_folder_path {path_to_your_testdata} --load_netG_checkpoint_path {path_to_your_netG_checkpoint} --name 'test_jiff' --load_netC_coarse_checkpoint_path {path_to_your_netC_coarse_checkpoint} --load_netC_checkpoint_path {path_to_your_netC_checkpoint}
If you find this code useful, please consider citing
@inproceedings{cao22jiff,
author = {Cao, Yukang and Chen, Guanying and Han, Kai and Yang, Wenqi and Wong, Kwan-Yee K.},
title = {JIFF: Jointly-Aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {2729-2739}
}
Our implementation is based on PIFu and Deep3DFaceReconstruction (and its pytorch version).
This work was partially supported by Hong Kong RGC GRF grant (project# 17203119), the National Key R&D Program of China (No.2018YFB1800800), and the Basic Research Project No. HZQB-KCZYZ2021067 of Hetao Shenzhen-HK S&T Cooperation Zone. We thank Yuanlu Xu for sharing results of ARCH and ARCH++.