StyleRes: Transforming the Residuals for Real Image Editing with StyleGAN (CVPR 2023)

Our inversion model adopts high-rate feature maps for incoming edits, so the quality of the edit is not compromised while retaining most of the image details. In this example, we show various edits using InterfaceGAN, GANSpace and StyleClip.

To Do List

[x] Release the inference code and demo.
[x] Add StyleClip's global directions
[x] Release Colab notebook.
[x] Add GradCtrl edits
[x] Release the evaluation code.
[x] Please send an email if you need the training code. This repo only supports inference and evaluation.

Prerequisites

Linux
- Tested on Ubuntu 18.04.
Can be run on CPU or Nvidia GPU.
- At least for 4GB GPU memory required for the GPU execution.
- StyleGAN uses system wide installation of CUDA to compile custom C++ codes. We have tested the code with CUDA 10.1 and upwards.
Anaconda is recommended to install the dependencies.
Installation
Install CUDA library (10.1 or upwards). One can follow NVIDIA Developer website. Make sure that CUDA_HOME environment variable is set, as indicated in the post installation steps.

Clone this repository:

git clone https://github.com/hamzapehlivan/StyleRes.git
cd StyleRes

Install the dependencies and activate the environment:

conda env create -f environment.yml
conda activate styleres

Pretrained Models

Download the pretrained models used in our work.	Path	Description
StyleRes (Face)	FFHQ StyleRes encoder. Includes StyleGAN2 generator and e4e base encoder
Facial Landmark	(Optional) Landmark model used in face alignment.

Our code assumes that the StyleRes encoder is saved to checkpoints directory. However, you can change it with the --checkpoint_path flag.

Demo

We provide a GUI application accessible online. Also, the application can be run locally, using gradio library. First, download the library with pip install gradio command. Then, you can run the demo with the following:

python app.py --device=cpu

,where the device argument can be either cpu or cuda.

Inference

Even though our model is not supervised for certain edits, it can entegrate with the existing image editing frameworks. This repository currently supports:

Smile, pose and age edits with InterfaceGAN.
42 GANSpace edits. See the config file for the complete list.
9 edits using StyleClip's mapper network. See the config file for the list. One can also utilize global directions method of StyleClip. We gave 17 examples in this file. Note that you should install the CLIP library for the global directions method:
```
pip install git+https://github.com/openai/CLIP.git
```
Smile, age, eyeglasses and gender edits with GradCtrl.

To edit images in the inference_samples directory, run the inference code as:

python inference.py --datadir=samples/inference_samples --outdir=results --edit_configs=options/editing_options/template.py

or alternatively, run the bash file directly:

bash inference.sh

--edit_configs includes configurations for each edit. We gave some examples on how to specify an edit in the template file. --resize_outputs flag can be used to resize the output images to 256x256 resolution. In the first run with GPU, StyleGAN2 compiles Pytorch extensions, which can take 1-2 minutes. The config files for each edit is in CSV format, therefore, one can better investigate by converting them into table.

If the test images are not aligned, you should align them with a landmark detector. First, install the dlib library:

apt install cmake
pip install dlib scipy

With --aligner_path argument, specify the path of the downloaded landmark detector model.

Metrics

We used CelebAHQ dataset in our paper. To extract smiling and non-smiling test images, run:

python dataset_tools.py --attribute Smiling 1 --set test
python dataset_tools.py --attribute Smiling -1 --set test

To extract only the test set (used in age and pose edit evaluations):

python dataset_tools.py --set test

FID: We used FID implementation given in StarGANv2 repository. Note that FID is very sensitive to the image formats (PNG, JPG..) and resizing methods. When calculating FID on CelebAHQ dataset, we downsampled the output images with bicubic interpolation (Resampling.BICUBIC of PIL library) and saved them in JPG format. Example usage is given in fid.sh.
ID: Identity metric can be used to measure identity differences after a certain edit. The implementation is borrowed from Hyperstyle. This metric requires CurricularFace Backbone (dowload link) and MTCNN (dowload link) models. By default, these models should be put in the 'checkpoints' directory. The default paths can be changed from the config file. Example usage is given in id.sh.
SSIM-LPIPS: These metrics are borrowed from e4e repository. Example usage is given in recon_metrics.sh

License

This work is available under NVIDIA Source Code License. This means that our work can solely be used for non-commercial purposes.

Related Works

This work builds upon a GAN inversion framework e4e in order to invert images into the latent and feature spaces of StyleGAN2.

We got inspired from various inversion works, and used the source code of some of them. The main ones are HFGI, Hyperstyle and IDInvert.

In our work, we show that GAN inversion models should be designed with editings in mind. Although we did not use any of the editing methods while training, we showcase that a wide range edits found by Interfacegan, GANSpace, StyleClip and GradCtrl are still applicable.

Citation

You can cite our work using the following:

@InProceedings{Pehlivan_2023_CVPR,
    author    = {Pehlivan, Hamza and Dalva, Yusuf and Dundar, Aysegul},
    title     = {StyleRes: Transforming the Residuals for Real Image Editing With StyleGAN},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {1828-1837}
}

hamzapehlivan / StyleRes

readme