SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation

S-Lab, Nanyang Technological University

[Project Page] | [Paper] | [Extension Paper] | [Dataset]

Updates

[02/2024] Dataset link has been updated with hugginface.
[09/2023] Arxiv extension paper released.
[07/2022] Pretrained models are uploaded.
[07/2022] Project page and dataset are released.
[07/2022] Code is released.

Introduction

This is the official implementation of Detecting and Recovering Sequential DeepFake Manipulation. We introduce a novel research problem: Detecting Sequential DeepFake Manipulation (Seq-DeepFake), which focus on detecting the sequences of multi-step facial manipulations. To faciliatate the study of Seq-Deepfake, we provide a large-scale Sequential Deepfake Dataset, and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer).

The framework of the proposed method:

Installation

Download

git clone https://github.com/rshao/SeqDeepFake.git
cd SeqDeepFake

Environment

We recommend using Anaconda to manage the python environment:

conda create -n seqdeepfake python=3.6
conda activate seqdeepfake
conda install -c pytorch pytorch=1.6.0 torchvision=0.7.0 cudatoolkit==10.1.243
conda install pandas
conda install tqdm
conda install pillow
pip install tensorboard==2.4.1

Dataset Preparation

A brief introduction

We contribute the first large-scale Sequential DeepFake Dataset, Seq-Deepfake, including ~85k sequentially manipulated face images, each annotated with its ground-truth manipulation sequence.

The images are generated based on the following two different facial manipulation methods, with 28 / 26 types of manipulation sequences (including original), repectively. The lengths of all manipulation sequences range from 1~5.

Sequential facial components manipulation (based on CelebAMask-HQ and StyleMapGAN)
Sequential facial attributes manipulation (based on FFHQ and Talk-To-Edit)

Here are some sample images and statistics:

Annotations

Each image in the dataset is annotated with a list of length 5, indicating the ground-truth manipulation sequence. The labels in the sequence are defined as follows:

For Sequential facial components manipulation:

0: 'NA', 1: 'nose', 2: 'eye', 3: 'eyebrow', 4: 'lip', 5: 'hair'

Note: 'NA' means no manipulation is taken in this step.

For Sequential facial attributes manipulation:

0: 'NA', 1: 'Bangs', 2: 'Eyeglasses', 3: 'Beard', 4: 'Smiling', 5: 'Young'

Note: 'NA' means no manipulation is taken in this step.

Note that label 0 serves as the placeholder for sequential manipulations shorter than 5 steps. For example, the annotation for manipulation sequence nose-eye-lip would be: [1, 2, 4, 0, 0]. Original images are annotated with [0, 0, 0, 0, 0].

Prepare data

You can download the Seq-Deepfake dataset through this link: [Dataset]

After unzip all sub files, the structure of the dataset should be as follows:

./
├── facial_attributes
│   ├── annotations
│   |   ├── train.csv
│   |   ├── test.csv
│   |   └── val.csv
│   └── images
│       ├── train
│       │   ├── Bangs-Eyeglasses-Smiling-Young
│       │   |   ├── xxxxxx.jpg
|       |   |   ...
|       |   |   └── xxxxxx.jpg
|       |   ...
│       │   ├── Young-Smiling-Eyeglasses
│       │   |   ├── xxxxxx.jpg
|       |   |   ...
|       |   |   └── xxxxxx.jpg
│       │   └── original
│       │       ├── xxxxxx.jpg
|       |       ...
|       |       └── xxxxxx.jpg
│       ├── test
│       │   % the same structure as in train
│       └── val
│           % the same structure as in train
└── facial_components
    ├── annotations
    |   ├── train.csv
    |   ├── test.csv
    |   └── val.csv
    └── images
        ├── train
        │   ├── eyebrow-eye-hair-nose-lip
        │   |   ├── xxxxxx.jpg
        |   |   ...
        |   |   └── xxxxxx.jpg
        |   ...
        │   ├── nose-eyebrow-lip-eye-hair
        │   |   ├── xxxxxx.jpg
        |   |   ...
        |   |   └── xxxxxx.jpg
        │   └── original
        │       ├── xxxxxx.jpg
        |       ...
        |       └── xxxxxx.jpg
        ├── test
        │   % the same structure as in train
        └── val
            % the same structure as in train

Training

Single-GPU

Modify train.sh and run:

sh train.sh

Please refer to the following instructions about some arguments:

Args	Description
CONFIG	Path of the network and optimization configuration file.
DATA_DIR	Directory to the downloaded dataset.
DATASET_NAME	Name of the selected manipulation type. Choose from 'facial_components' and 'facial_attributes'.
RESULTS_DIR	Directory to save logs and checkpoints.

You can change the network and optimization configurations by adding new configuration files under the directory ./configs/.

Multiple-GPUs (Slurm)

We also provide slurm script that supports multiple GPUs training:

sh train_slurm.sh

where PARTITION and NODE should be modified according to your own environment. The number of GPUs to be used can be set through the NUM_GPU argument.

Testing

Modify test.sh and run:

sh test.sh

For the arguments in `test.sh`, please refer to the training instructions above, plus the following ones:	Args	Description
TEST_TYPE	The evaluation metrics to use. Choose from 'fixed' and 'adaptive'.
LOG_NAME	Should be set according to the log_name of your trained checkpoint to be tested.

We also provide slurm script for testing:

sh test_slurm.sh

Benchmark Results

Here we list the performance of three SOTA deepfake detection methods and our method. Please refer to our paper for more details.

Facial Components Manipulation

Method	Reference	Fixed-Acc ${\uparrow}$	Adaptive-Acc ${\uparrow}$
DRN	Wang et al.	66.06	45.79
MA	Zhao et al.	71.31	52.94
Two-Stream	Luo et al.	71.92	53.89
SeqFakeFormer	Shao et al.	72.65	55.30

Facial Attributes Manipulation

Method	Reference	Fixed-Acc ${\uparrow}$	Adaptive-Acc ${\uparrow}$
DRN	Wang et al.	64.42	43.20
MA	Zhao et al.	67.58	47.48
Two-Stream	Luo et al.	66.77	46.38
SeqFakeFormer	Shao et al.	68.86	49.63

Pretrained Models

We also provide the pretrained models that generate our results in the benchmark table:

Model	Description
pretrained-r50-c	Trained on `facial_components` with `resnet50` backbone.
pretrained-r50-a	Trained on `facial_attributes` with `resnet50` backbone.

In order to try the pre-trained checkpoints, please:

download from the links in the table, unzip the file and put them under the ./results folder with the following structure:

results
└── resnet50
    ├── facial_attributes
    │   └── pretrained-r50-a
    │       └── snapshots
    │           ├── best_model_adaptive.pt
    │           └── best_model_fixed.pt
    └── facial_components
        └── pretrained-r50-c
            └── snapshots
                ├── best_model_adaptive.pt
                └── best_model_fixed.pt

In test.sh, modify DATA_DIR to the root of your Seq-DeepFake dataset. Modify LOGNAME and DATASET_NAME to 'pretrained-r50-c', 'facial_components' or 'pretrained-r50-a', 'facial_attributes', respectively.
Run test.sh.

Citation

If you find this work useful for your research, please kindly cite our paper:

@inproceedings{shao2022seqdeepfake,
  title={Detecting and Recovering Sequential DeepFake Manipulation},
  author={Shao, Rui and Wu, Tianxing and Liu, Ziwei},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2022}
}

[//]: <## Acknowledgements>

rshaojimmy / SeqDeepFake

readme