This is a tensorflow implementation of the following paper:
Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning, CVPR 2020. (Oral)
Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong
Paper: https://arxiv.org/abs/2004.11660
Abstract: We propose DiscoFaceGAN, an approach for face image generation of virtual people with DISentangled, precisely-COntrollable latent representations for identity of non-existing people, expression, pose, and illumination. We embed 3D priors into adversarial learning and train the network to imitate the image formation of an analytic 3D face deformation and rendering process. To deal with the generation freedom induced by the domain gap between real and rendered faces, we further introduce contrastive learning to promote disentanglement by comparing pairs of generated images. Experiments show that through our imitative-contrastive learning, the factor variations are very well disentangled and the properties of a generated face can be precisely controlled. We also analyze the learned latent space and present several meaningful properties supporting factor disentanglement. Our method can also be used to embed real images into the disentangled latent space. We hope our method could provide new understandings of the relationship between physical properties and deep image synthesis.
When generating face images, we can freely change the four factors including identity, expression, lighting, and pose. The factor variations are highly disentangled: changing one factor does not affect others.
We achieve reference-based generation where we extract expression, pose and lighting from a given image and generate new identities with similar properties.
We can use our method to embed a real image into the disentangled latent space and edit it, such as pose manipulation.
We can edit the lighting of a real image.
We can also achieve expression transfer of real images.
The training code of our model are mainly borrowed from StyleGAN, although our method can be applied to any form of GANs.
git clone https://github.com/microsoft/DiscoFaceGAN.git
cd DiscoFaceGAN
# Generate face images with random variations of expression, lighting, and pose
python generate_images.py
# Generate face images with random variations of expression
python generate_images.py --factor 1
# Generate face images with random variations of lighting
python generate_images.py --factor 2
# Generate face images with random variations of pose
python generate_images.py --factor 3
python preprocess_data.py --image_path=<raw_image_path> --lm_path=<raw_lm_path> --save_path=<save_path_for_processed_data>
python dataset_tool.py create_from_images ./datasets/ffhq_align <save_path_for_processed_data>/img
cd vae
# train VAE for identity coefficients
python demo.py --datapath <save_path_for_processed_data>/coeff --factor id
# train VAE for expression coefficients
python demo.py --datapath <save_path_for_processed_data>/coeff --factor exp
# train VAE for lighting coefficients
python demo.py --datapath <save_path_for_processed_data>/coeff --factor gamma
# train VAE for pose coefficients
python demo.py --datapath <save_path_for_processed_data>/coeff --factor rot
# Stage 1 with only imitative losses, training with 15000k images
python train.py
# Stage 2 with both imitative losses and contrastive losses, training with another 5000k images
python train.py --stage 2 --run_id <stage1_model_id> --snapshot <stage1_model_snapshot> --kimg <stage1_model_snapshot>
# For example
python train.py --stage 2 --run_id 0 --snapshot 14926 --kimg 14926
After training, the network can be used similarly as the provided pre-trained model:
# Generate face images with specific model
python generate_images.py --model <your_model_path.pkl>
We have trained the model using a configuration of 4 Tesla P100 GPUs. It takes 6d 15h for stage 1 and 5d 8h for stage 2.
If you have any questions, please contact Yu Deng (dengyu2008@hotmail.com) and Jiaolong Yang (jiaoyan@microsoft.com)
Copyright © Microsoft Corporation.
Licensed under the MIT license.
Please cite the following paper if this model helps your research:
@inproceedings{deng2020disentangled,
title={Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning},
author={Yu Deng and Jiaolong Yang and Dong Chen and Fang Wen and Xin Tong},
booktitle={IEEE Computer Vision and Pattern Recognition},
year={2020}
}
The real face images on this page are from the public FFHQ dataset released under Creative Commons BY-NC-SA 4.0 license. Detailed information can be found on its website.