williamyang1991 / StyleGANEX

[ICCV 2023] StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
Other
485 stars 34 forks source link
face face-editing face-manipulation stylegan2

StyleGANEX - Official PyTorch Implementation

https://user-images.githubusercontent.com/18130694/224256980-03fb15e7-9858-4300-9d35-7604d03c69f9.mp4

This repository provides the official PyTorch implementation for the following paper:

StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces
Shuai Yang, Liming Jiang, Ziwei Liu and Chen Change Loy
In ICCV 2023.
Project Page | Paper | Supplementary Video

google colab logo Hugging Face Spaces visitors

Abstract: Recent advances in face manipulation using StyleGAN have produced impressive results. However, StyleGAN is inherently limited to cropped aligned faces at a fixed image resolution it is pre-trained on. In this paper, we propose a simple and effective solution to this limitation by using dilated convolutions to rescale the receptive fields of shallow layers in StyleGAN, without altering any model parameters. This allows fixed-size small features at shallow layers to be extended into larger ones that can accommodate variable resolutions, making them more robust in characterizing unaligned faces. To enable real face inversion and manipulation, we introduce a corresponding encoder that provides the first-layer feature of the extended StyleGAN in addition to the latent style code. We validate the effectiveness of our method using unaligned face inputs of various resolutions in a diverse set of face manipulation tasks, including facial attribute editing, super-resolution, sketch/mask-to-face translation, and face toonification.

Features:

overview

Updates

Installation

Clone this repo:

git clone https://github.com/williamyang1991/StyleGANEX.git
cd StyleGANEX

Dependencies:

We have tested on:


(1) Inference

Inference Notebook

google colab logo

To help users get started, we provide a Jupyter notebook found in ./inference_playground.ipynb that allows one to visualize the performance of StyleGANEX. The notebook will download the necessary pretrained models and run inference on the images found in ./data/.

Gradio demo

We also provide a UI for testing StyleGANEX that is built with gradio. Running the following command in a terminal will launch the demo:

python app_gradio.py

This demo is also hosted on Hugging Face.

Pre-trained Models

Pre-trained models can be downloaded from Google Drive, Baidu Cloud (access code: luck) or Hugging Face:

TaskModelDescription
Inversionstyleganex_inversion.ptpre-trained model for StyleGANEX inversion
Image translationstyleganex_sr32.ptpre-trained model specially for 32x face super resolution
styleganex_sr.ptpre-trained model for 4x-48x face super resolution
styleganex_sketch2face.ptpre-trained model for skech-to-face translation
styleganex_mask2face.ptpre-trained model for parsing map-to-face translation
Video editingstyleganex_edit_hair.ptpre-trained model for hair color editing on videos
styleganex_edit_age.ptpre-trained model for age editing on videos
styleganex_toonify_cartoon.ptpre-trained Cartoon model for video face toonification
styleganex_toonify_arcane.ptpre-trained Arcane model for video face toonification
styleganex_toonify_pixar.ptpre-trained Pixar model for video face toonification
Supporting model
faceparsing.pthBiSeNet for face parsing from face-parsing.PyTorch

The downloaded models are suggested to be put into ./pretrained_models/

StyleGANEX Inversion

We can embed a face image into the latent space of StyleGANEX to obtain its w+ latent code and the first-layer feature f with inversion.py.

python inversion.py --ckpt STYLEGANEX_MODEL_PATH --data_path FACE_IMAGE_PATH

The results are saved in the folder ./output/. The results contain a reconstructed image FILE_NAME_inversion.jpg and a FILE_NAME_inversion.pt file. You can obtain w+ latent code and the first-layer feature f by

latents = torch.load('./output/FILE_NAME_inversion.pt')
wplus_hat = latents['wplus'].to(device) # w+
f_hat = [latents['f'][0].to(device)]    # f

The ./inference_playground.ipynb provides some face editing examples based on wplus_hat and f_hat.

Image Translation

image_translation.py supports face super-resolution, sketch-to-face translation and parsing map-to-face translation.

python image_translation.py --ckpt STYLEGANEX_MODEL_PATH --data_path FACE_INPUT_PATH

The results are saved in the folder ./output/.

Additional notes to consider:

Video Editing

video_editing.py supports video facial attribute editing and video face toonification.

python video_editing.py --ckpt STYLEGANEX_MODEL_PATH --data_path FACE_INPUT_PATH

The results are saved in the folder ./output/.

Additional notes to consider:


(2) Training

Preparing your Data

As an example, assume we wish to run encoding using ffhq (dataset_type=ffhq_encode). We first go to configs/paths_config.py and define:

dataset_paths = {
    'ffhq': '/path/to/ffhq/realign320x320'
    'ffhq_test': '/path/to/ffhq/realign320x320_test'
}

The transforms for the experiment are defined in the class EncodeTransforms in configs/transforms_config.py.
Finally, in configs/data_configs.py, we define:

DATASETS = {
   'ffhq_encode': {
        'transforms': transforms_config.EncodeTransforms,
        'train_source_root': dataset_paths['ffhq'],
        'train_target_root': dataset_paths['ffhq'],
        'test_source_root': dataset_paths['ffhq_test'],
        'test_target_root': dataset_paths['ffhq_test'],
    },
}

When defining our datasets, we will take the values in the above dictionary.

The 1280x1280 ffhq images can be obtain by the modified script of official ffhq:

Downloading supporting models

Please download the pre-trained models to support the training of StyleGANEX Path Description
original_stylegan StyleGAN trained with the FFHQ dataset
toonify_model StyleGAN finetuned on cartoon dataset for image toonification (cartoon, pixar, arcane)
original_psp_encoder pSp trained with the FFHQ dataset for StyleGAN inversion.
pretrained_encoder StyleGANEX encoder pretrained with the synthetic data for StyleGAN inversion.
styleganex_encoder StyleGANEX encoder trained with the FFHQ dataset for StyleGANEX inversion.
editing_vector Editing vectors for editing face attributes (age, hair color)
augmentation_vector Editing vectors for data augmentation

The main training script can be found in scripts/train.py.
Intermediate training results are saved to opts.exp_dir. This includes checkpoints, train outputs, and test outputs.

Training styleganex

Note: Our default code is a CPU-compatible version. You can switch to a more efficient version by using cpp extention. To do so, please change models.stylegan2.op to models.stylegan2.op_old https://github.com/williamyang1991/StyleGANEX/blob/73b580cc7eb757e36701c094456e9ee02078d03e/models/stylegan2/model.py#L8

Training the styleganex encoder

First pretrain encoder on synthetic 1024x1024 images. You can download our pretrained encoder here

python scripts/pretrain.py \
--exp_dir=/path/to/experiment \
--ckpt=/path/to/original_psp_encoder \
--max_steps=2000

Then finetune encoder on real 1280x1280 ffhq images based on the pretrained encoder

python scripts/train.py \
--dataset_type=ffhq_encode \
--exp_dir=/path/to/experiment \
--checkpoint_path=/path/to/pretrained_encoder \
--max_steps=100000 \
--workers=8 \
--batch_size=8 \
--val_interval=2500 \
--save_interval=50000 \
--start_from_latent_avg \
--id_lambda=0.1 \
--w_norm_lambda=0.001 \
--affine_augment \
--random_crop \
--crop_face

Sketch to Face

python scripts/train.py \
--dataset_type=ffhq_sketch_to_face \
--exp_dir=/path/to/experiment \
--stylegan_weights=/path/to/original_stylegan \
--max_steps=100000 \
--workers=8 \
--batch_size=8 \
--val_interval=2500 \
--save_interval=10000 \
--start_from_latent_avg \
--w_norm_lambda=0.005 \
--affine_augment \
--random_crop \
--crop_face \
--use_skip \
--skip_max_layer=1 \
--label_nc=1 \
--input_nc=1 \
--use_latent_mask

Segmentation Map to Face

python scripts/train.py \
--dataset_type=ffhq_seg_to_face \
--exp_dir=/path/to/experiment \
--stylegan_weights=/path/to/original_stylegan \
--max_steps=100000 \
--workers=8 \
--batch_size=8 \
--val_interval=2500 \
--save_interval=10000 \
--start_from_latent_avg \
--w_norm_lambda=0.005 \
--affine_augment \
--random_crop \
--crop_face \
--use_skip \
--skip_max_layer=2 \
--label_nc=19 \
--input_nc=19 \
--use_latent_mask 

Super Resolution

python scripts/train.py \
--dataset_type=ffhq_super_resolution \
--exp_dir=/path/to/experiment \
--checkpoint_path=/path/to/styleganex_encoder \
--max_steps=100000 \
--workers=4 \
--batch_size=4 \
--val_interval=2500 \
--save_interval=10000 \
--start_from_latent_avg \
--adv_lambda=0.1 \
--affine_augment \
--random_crop \
--crop_face \
--use_skip \
--skip_max_layer=4 \
--resize_factors=8

For one model supporting multiple resize factors, set --skip_max_layer=2 and --resize_factors=1,2,4,8,16

Video Editing

python scripts/train.py \
--dataset_type=ffhq_edit \
--exp_dir=/path/to/experiment \
--checkpoint_path=/path/to/styleganex_encoder \
--max_steps=100000 \
--workers=2 \
--batch_size=2 \
--val_interval=2500 \
--save_interval=10000 \
--start_from_latent_avg \
--adv_lambda=0.1 \
--tmp_lambda=30 \
--affine_augment \
--crop_face \
--use_skip \
--skip_max_layer=7 \
--editing_w_path=/path/to/editing_vector \
--direction_path=/path/to/augmentation_vector \
--use_att=1 \
--generate_training_data

Video Toonification

python scripts/train.py \
--dataset_type=toonify \
--exp_dir=/path/to/experiment \
--checkpoint_path=/path/to/styleganex_encoder \
--max_steps=55000 \
--workers=2 \
--batch_size=2 \
--val_interval=2500 \
--save_interval=10000 \
--start_from_latent_avg \
--adv_lambda=0.1 \
--tmp_lambda=30 \
--affine_augment \
--crop_face \
--use_skip \
--skip_max_layer=7 \
--toonify_weights=/path/to/toonify_model

Additional Notes

(3) Results

Overview of StyleGANEX inversion and facial attribute/style editing on unaligned faces:

result

Video facial attribute editing:

https://user-images.githubusercontent.com/18130694/224287063-7465a301-4d11-4322-819a-59d548308ce1.mp4


Video face toonification:

https://user-images.githubusercontent.com/18130694/224287136-7e5ce82d-664f-4a23-8ed3-e7005efb3b24.mp4

Citation

If you find this work useful for your research, please consider citing our paper:

@inproceedings{yang2023styleganex,
 title = {StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces},
 author = {Yang, Shuai and Jiang, Liming and Liu, Ziwei and and Loy, Chen Change},
 booktitle = {ICCV},
 year = {2023},
}

Acknowledgments

The code is mainly developed based on stylegan2-pytorch, pixel2style2pixel and VToonify.