Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation

This is an unofficial official pytorch implementation of the following paper:

Y. Deng, J. Yang, S. Xu, D. Chen, Y. Jia, and X. Tong, Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set, IEEE Computer Vision and Pattern Recognition Workshop (CVPRW) on Analysis and Modeling of Faces and Gestures (AMFG), 2019. (Best Paper Award!)

The method enforces a hybrid-level weakly-supervised training for CNN-based 3D face reconstruction. It is fast, accurate, and robust to pose and occlussions. It achieves state-of-the-art performance on multiple datasets such as FaceWarehouse, MICC Florence and NoW Challenge.

For the original tensorflow implementation, check this repo.

This implementation is written by S. Xu.

04/25/2023 Update

We updated a new model to improve the results on "closed eye" images. We collected ~2K facial images with closed eyes and included them in the training data. The updated model has similar reconstruction accuracy as the previous one on the benchmarks, but has better results for faces with closed eyes (see below). Here's the link (google drive) to the new model.

● Reconstruction accuracy

Method	FaceWareHouse	MICC Florence
Deep3DFace_PyTorch_20230425	1.60±0.44	1.54±0.49

● Visual quality

Performance

● Reconstruction accuracy

The pytorch implementation achieves lower shape reconstruction error (9% improvement) compare to the original tensorflow implementation. Quantitative evaluation (average shape errors in mm) on several benchmarks is as follows:

Method	FaceWareHouse	MICC Florence	NoW Challenge
Deep3DFace Tensorflow	1.81±0.50	1.67±0.50	1.54±1.29
Deep3DFace PyTorch	1.64±0.50	1.53±0.45	1.41±1.21

The comparison result with state-of-the-art public 3D face reconstruction methods on the NoW face benchmark is as follows:	Rank	Method	Median(mm)	Mean(mm)
1.	DECA[Feng et al., SIGGRAPH 2021]	1.09	1.38	1.18
2.	Deep3DFace PyTorch	1.11	1.41	1.21
3.	RingNet [Sanyal et al., CVPR 2019]	1.21	1.53	1.31
4.	Deep3DFace [Deng et al., CVPRW 2019]	1.23	1.54	1.29
5.	3DDFA-V2 [Guo et al., ECCV 2020]	1.23	1.57	1.39
6.	MGCNet [Shang et al., ECCV 2020]	1.31	1.87	2.63
7.	PRNet [Feng et al., ECCV 2018]	1.50	1.98	1.88
8.	3DMM-CNN [Tran et al., CVPR 2017]	1.84	2.33	2.05

For more details about the evaluation, check Now Challenge website.

A recent benchmark REALY indicates that our method still has the SOTA performance! You can check their paper and website for more details.

● Visual quality

The pytorch implementation achieves better visual consistency with the input images compare to the original tensorflow version.

● Speed

The training speed is on par with the original tensorflow implementation. For more information, see here.

Major changes

● Differentiable renderer

We use Nvdiffrast which is a pytorch library that provides high-performance primitive operations for rasterization-based differentiable rendering. The original tensorflow implementation used tf_mesh_renderer instead.

● Face recognition model

We use Arcface, a state-of-the-art face recognition model, for perceptual loss computation. By contrast, the original tensorflow implementation used Facenet.

● Training configuration

Data augmentation is used in the training process which contains random image shifting, scaling, rotation, and flipping. We also enlarge the training batchsize from 5 to 32 to stablize the training process.

● Training data

We use an extra high quality face image dataset FFHQ to increase the diversity of training data.

Requirements

This implementation is only tested under Ubuntu environment with Nvidia GPUs and CUDA installed. But it should also work on Windows with proper lib configures.

Installation

Clone the repository and set up a conda environment with all dependencies as follows:

git clone https://github.com/sicxu/Deep3DFaceRecon_pytorch.git
cd Deep3DFaceRecon_pytorch
conda env create -f environment.yml
source activate deep3d_pytorch

Install Nvdiffrast library:

git clone -b 0.3.0 https://github.com/NVlabs/nvdiffrast
cd nvdiffrast    # ./Deep3DFaceRecon_pytorch/nvdiffrast
pip install .

Install Arcface Pytorch:

cd ..    # ./Deep3DFaceRecon_pytorch
git clone https://github.com/deepinsight/insightface.git
cp -r ./insightface/recognition/arcface_torch ./models/

Inference with a pre-trained model

Prepare prerequisite models

Our method uses Basel Face Model 2009 (BFM09) to represent 3d faces. Get access to BFM09 using this link. After getting the access, download "01_MorphableModel.mat". In addition, we use an Expression Basis provided by Guo et al.. Download the Expression Basis (Exp_Pca.bin) using this link (google drive). Organize all files into the following structure:
```
Deep3DFaceRecon_pytorch
│
└─── BFM
│
└─── 01_MorphableModel.mat
│
└─── Exp_Pca.bin
|
└─── ...
```
We provide a model trained on a combination of CelebA, LFW, 300WLP, IJB-A, LS3D-W, and FFHQ datasets. Download the pre-trained model using this link (google drive) and organize the directory into the following structure:
```
Deep3DFaceRecon_pytorch
│
└─── checkpoints
│
└─── <model_name>
    │
    └─── epoch_20.pth
```


### Test with custom images
To reconstruct 3d faces from test images, organize the test image folder as follows:

Deep3DFaceRecon_pytorch │ └─── │ └─── .jpg/.png	└─── detections

└─── *.txt

The \*.jpg/\*.png files are test images. The \*.txt files are detected 5 facial landmarks with a shape of 5x2, and have the same name as the corresponding images. Check [./datasets/examples](datasets/examples) for a reference.

Then, run the test script:

get reconstruction results of your custom images

python test.py --name= --epoch=20 --img_folder=

get reconstruction results of example images

python test.py --name= --epoch=20 --img_folder=./datasets/examples

**_Following [#108](https://github.com/sicxu/Deep3DFaceRecon_pytorch/issues/108), if you don't have OpenGL environment, you can simply add "--use_opengl False" to use CUDA context. Make sure you have updated the nvdiffrast to the latest version._**

Results will be saved into ./checkpoints/<model_name>/results/<folder_to_test_images>, which contain the following files:
| \*.png | A combination of cropped input image, reconstructed image, and visualization of projected landmarks.
|:----|:-----------|
| \*.obj | Reconstructed 3d face mesh with predicted color (texture+illumination) in the world coordinate space. Best viewed in Meshlab. |
| \*.mat | Predicted 257-dimensional coefficients and 68 projected 2d facial landmarks. Best viewed in Matlab.

## Training a model from scratch
### Prepare prerequisite models
1. We rely on [Arcface](https://github.com/deepinsight/insightface/tree/master/recognition/arcface_torch) to extract identity features for loss computation. Download the pre-trained model from Arcface using this [link](https://github.com/deepinsight/insightface/tree/master/recognition/arcface_torch#ms1mv3). By default, we use the resnet50 backbone ([ms1mv3_arcface_r50_fp16](https://onedrive.live.com/?authkey=%21AFZjr283nwZHqbA&id=4A83B6B633B029CC%215583&cid=4A83B6B633B029CC)), organize the download files into the following structure:

Deep3DFaceRecon_pytorch │ └─── checkpoints │ └─── recog_model │ └─── ms1mv3_arcface_r50_fp16 | └─── backbone.pth

2. We initialize R-Net using the weights trained on [ImageNet](https://image-net.org/). Download the weights provided by PyTorch using this [link](https://download.pytorch.org/models/resnet50-0676ba61.pth), and organize the file as the following structure:

Deep3DFaceRecon_pytorch │ └─── checkpoints │ └─── init_model │ └─── resnet50-0676ba61.pth

3. We provide a landmark detector (tensorflow model) to extract 68 facial landmarks for loss computation. The detector is trained on [300WLP](http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/main.htm), [LFW](http://vis-www.cs.umass.edu/lfw/), and [LS3D-W](https://www.adrianbulat.com/face-alignment) datasets. Download the trained model using this [link (google drive)](https://drive.google.com/file/d/1Jl1yy2v7lIJLTRVIpgg2wvxYITI8Dkmw/view?usp=sharing) and organize the file as follows:

Deep3DFaceRecon_pytorch │ └─── checkpoints │ └─── lm_model │ └─── 68lm_detector.pb

### Data preparation
1. To train a model with custom images，5 facial landmarks of each image are needed in advance for an image pre-alignment process. We recommend using [dlib](http://dlib.net/) or [MTCNN](https://github.com/ipazc/mtcnn) to detect these landmarks. Then, organize all files into the following structure:

Deep3DFaceRecon_pytorch │ └─── datasets │ └─── │ └─── .png/.jpg	└─── detections

    └─── *.txt

The \*.txt files contain 5 facial landmarks with a shape of 5x2, and should have the same name with their corresponding images.

2. Generate 68 landmarks and skin attention mask for images using the following script:

preprocess training images

python data_preparation.py --img_folder

alternatively, you can preprocess multiple image folders simultaneously

python data_preparation.py --img_folder

preprocess validation images

python data_preparation.py --img_folder --mode=val

The script will generate files of landmarks and skin masks, and save them into ./datasets/<folder_to_training_images>. In addition, it also generates a file containing the path of all training data into ./datalist which will then be used in the training script.

### Train the face reconstruction network
Run the following script to train a face reconstruction model using the pre-processed data:

train with single GPU

python train.py --name= --gpu_ids=0

train with multiple GPUs

python train.py --name= --gpu_ids=0,1

train with other custom settings

python train.py --name= --gpu_ids=0 --batch_size=32 --n_epochs=20


Training logs and model parameters will be saved into ./checkpoints/<custom_experiment_name>. 

By default, the script uses a batchsize of 32 and will train the model with 20 epochs. For reference, the pre-trained model in this repo is trained with the default setting on a image collection of 300k images. A single iteration takes 0.8~0.9s on a single Tesla M40 GPU. The total training process takes around two days.

To use a trained model, see [Inference](https://github.com/sicxu/Deep3DFaceRecon_pytorch#inference-with-a-pre-trained-model) section.
## Contact
If you have any questions, please contact the paper authors.

## Citation

Please cite the following paper if this model helps your research:

    @inproceedings{deng2019accurate,
        title={Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set},
        author={Yu Deng and Jiaolong Yang and Sicheng Xu and Dong Chen and Yunde Jia and Xin Tong},
        booktitle={IEEE Computer Vision and Pattern Recognition Workshops},
        year={2019}
    }
##
The face images on this page are from the public [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) dataset released by MMLab, CUHK.

Part of the code in this implementation takes [CUT](https://github.com/taesungp/contrastive-unpaired-translation) as a reference.

sicxu / Deep3DFaceRecon_pytorch

readme

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation

04/25/2023 Update

● Reconstruction accuracy

● Visual quality

Performance

● Reconstruction accuracy

● Visual quality

● Speed

Major changes

● Differentiable renderer

● Face recognition model

● Training configuration

● Training data

Requirements

Installation

Inference with a pre-trained model

Prepare prerequisite models

get reconstruction results of your custom images

get reconstruction results of example images

preprocess training images

alternatively, you can preprocess multiple image folders simultaneously

preprocess validation images

train with single GPU

train with multiple GPUs

train with other custom settings