This repository contains the implementation of the following paper:
Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang∗, Ziqi Huang∗, Xingang Pan, Chen Change Loy, Ziwei Liu
IEEE International Conference on Computer Vision (ICCV), 2021
[Paper] [Project Page] [CelebA-Dialog Dataset] [Poster] [Video]
You can try our colab demo here. Enjoy!
Clone Repo
git clone git@github.com:yumingj/Talk-to-Edit.git
Create Conda Environment and Install Dependencies
conda env create -f environment.yml
conda activate talk_edit
We provide scripts for editing using our pretrained models.
First, download the pretrained models from this link and put them under ./download/pretrained_models
as follows:
./download/pretrained_models
├── 1024_field
│ ├── Bangs.pth
│ ├── Eyeglasses.pth
│ ├── No_Beard.pth
│ ├── Smiling.pth
│ └── Young.pth
├── 128_field
│ ├── Bangs.pth
│ ├── Eyeglasses.pth
│ ├── No_Beard.pth
│ ├── Smiling.pth
│ └── Young.pth
├── arcface_resnet18_110.pth
├── language_encoder.pth.tar
├── predictor_1024.pth.tar
├── predictor_128.pth.tar
├── stylegan2_1024.pth
├── stylegan2_128.pt
├── StyleGAN2_FFHQ1024_discriminator.pth
└── eval_predictor.pth.tar
You can try pure image editing without dialog instructions:
python editing_wo_dialog.py \
--opt ./configs/editing/editing_wo_dialog.yml \
--attr 'Bangs' \
--target_val 5
The editing results will be saved in ./results
.
You can change attr
to one of the following attributes: Bangs
, Eyeglasses
, Beard
, Smiling
, and Young(i.e. Age)
. And the target_val
can be [0, 1, 2, 3, 4, 5]
.
You can also try dialog-based editing, where you talk to the system through the command prompt:
python editing_with_dialog.py --opt ./configs/editing/editing_with_dialog.yml
The editing results will be saved in ./results
.
How to talk to the system:
Bangs
, Eyeglasses
, Beard
, Smiling
, and Young(i.e. Age)
."Enter your request (Press enter when you finish):"
, you can enter an editing request about one of the five attributes. For example, you can say "Make the bangs longer."
"Is the length of the bangs just right?"
after one round of editing, You can say things like "Yes."
/ "No."
/ "Yes, and I also want her to smile more happily."
."That's all"
/ "Nothing else, thank you."
By default, the above editing would be performed on the teaser image. You may change the image to be edited in two ways: 1) change line 11: latent_code_index
to other values ranging from 0
to 99
; 2) set line 10: latent_code_path
to ~
, so that an image would be randomly generated.
If you want to try editing on real images, you may download the real images from this link and put them under ./download/real_images
. You could also provide other real images at your choice. You need to change line 12: img_path
in editing_with_dialog.yml
or editing_wo_dialog.yml
according to the path to the real image and set line 11: is_real_image
as True
.
You can switch the default image size to 128 x 128
by setting line 3: img_res
to 128
in config files.
To train the Semantic Field, a number of sampled latent codes should be prepared and then we use the attribute predictor to predict the facial attributes for their corresponding images. The attribute predictor is trained using fine-grained annotations in CelebA-Dialog dataset. Here, we provide the latent codes we used. You can download the train data from this link and put them under ./download/train_data
as follows:
./download/train_data
├── 1024
│ ├── Bangs
│ ├── Eyeglasses
│ ├── No_Beard
│ ├── Smiling
│ └── Young
└── 128
├── Bangs
├── Eyeglasses
├── No_Beard
├── Smiling
└── Young
We will also use some editing latent codes to monitor the training phase. You can download the editing latent code from this link and put them under ./download/editing_data
as follows:
./download/editing_data
├── 1024
│ ├── Bangs.npz.npy
│ ├── Eyeglasses.npz.npy
│ ├── No_Beard.npz.npy
│ ├── Smiling.npz.npy
│ └── Young.npz.npy
└── 128
├── Bangs.npz.npy
├── Eyeglasses.npz.npy
├── No_Beard.npz.npy
├── Smiling.npz.npy
└── Young.npz.npy
All logging files in the training process, e.g., log message, checkpoints, and snapshots, will be saved to ./experiments
and ./tb_logger
directory.
There are 10 configuration files under ./configs/train
, named in the format of field_<IMAGE_RESOLUTION>_<ATTRIBUTE_NAME>
.
Choose the corresponding configuration file for the attribute and resolution you want.
For example, to train the semantic field which edits the attribute Bangs
in 128x128
image resolution, simply run:
python train.py --opt ./configs/train/field_128_Bangs.yml
We provide codes for quantitative results shown in Table 1. Here we use Bangs
in 128x128
resolution as an example.
Use the trained semantic field to edit images.
python editing_quantitative.py \
--opt ./configs/train/field_128_bangs.yml \
--pretrained_path ./download/pretrained_models/128_field/Bangs.pth
Evaluate the edited images using quantitative metircs. Change image_num
for different attribute accordingly: Bangs: 148
, Eyeglasses: 82
, Beard: 129
, Smiling: 140
, Young: 61
.
python quantitative_results.py \
--attribute Bangs \
--work_dir ./results/field_128_bangs \
--image_dir ./results/field_128_bangs/visualization \
--image_num 148
Our CelebA-Dialog Dataset is available for Download.
CelebA-Dialog is a large-scale visual-language face dataset with the following features:
The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, and broader natural language based facial recognition and manipulation tasks.
If you find our repo useful for your research, please consider citing our paper:
@inproceedings{jiang2021talk,
title={Talk-to-Edit: Fine-Grained Facial Editing via Dialog},
author={Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={13799--13808},
year={2021}
}
@article{jiang2023talk,
title={Talk-to-edit: Fine-grained 2d and 3d facial editing via dialog},
author={Jiang, Yuming and Huang, Ziqi and Wu, Tianxing and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
publisher={IEEE}
}
If you have any question, please feel free to contact us via yuming002@ntu.edu.sg
or hu0007qi@ntu.edu.sg
.
The codebase is maintained by Yuming Jiang and Ziqi Huang.
Part of the code is borrowed from stylegan2-pytorch, IEP and face-attribute-prediction.