FFAA: Face Forgery Analysis Assistant

FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant [Paper][Project Page]
Zhengchao Huang, Bin Xia, Zicheng Lin, Zhun Mou, Wenming Yang

Release

[2024/09/26] The code and the model weights of FFAA have been made public!

Install
Model Zoo
Dataset
Train
Evaluation
Inference

Install

Clone this repository and navigate to FFAA folder

git clone https://github.com/thu-huangzc/FFAA.git
cd FFAA

Install Package

conda create -n ffaa python=3.9 -y
conda activate ffaa
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases
```
pip install -e ".[train]"
```
The training dataset has not been made public so you don't need to install these packages now.

Upgrade to latest code base

git pull
pip install -e .

Train

FFAA training consists of two stages: (1) Fine-tuning MLLM with hypothetical prompts stage: We introduce hypothetical prompts to the 20K FFA-VQA dataset that presume the face is either real or fake prior to analysis. By fine-tuning the MLLM on this dataset, we enable the generation of answers based on varying hypotheses; (2) Training MIDS with historical answers stage: Utilize the fine-tuned MLLM to extract answers from unused samples in the MA dataset for training.

FFAA is trained on 2 RTX 3090 GPUs with 24GB memory.

Hyperparameters

LoRA Fine-tuning

Module	Global Batch Size	Learning rate	Epochs	LoRA Rank	LoRA alpha
LLaVA-v1.6-Mistral-7B	16	1e-4	3	32	48

Train MIDS

Module	Global Batch Size	Learning rate	Epochs	Weight decay
MIDS	48	1e-4	2	1e-5

Other hyperparameters settings can be found at scripts/train.

Download LLaVA checkpoints

We utilize LLaVA as our base MLLM module. Certainly, you can choose any other MLLMs as the backbone. In the paper, we select LLaVA-v1.6-Mistral-7B and here are the available download links: Huggingface, Model Scope

Fine-tune MLLM with Hypothetical prompts

Download the 20K FFA-VQA dataset containing hypothetical prompts and place the folder in ./playground/

Training script with DeepSpeed: finetune_mistral_lora.sh

Train MIDS

Prepare data

You can use the fine-tuned MLLM to obtain the historical answer data from the unused face images. For each image, we utilize one non-hypothetical prompt and two hypothetical prompts to get three answers. In fact, MIDS is designed to select the correct one from multiple answers. Therefore, the image won't be added into the dataset if the results of the three answers are the same.

Certainly, you can download 90K Mistral-FFA-VQA dataset we provided.

After downloading the dataset, place the folder in ./playground/

Training

Training script with DeepSpeed: train_mids_v1.sh

Evaluation

We evaluate models on OW-FFA-Bench, which consists of 6 generalization test sets.

First, download OW-FFA-Bench and the test set of Multi-attack in benchmark.md

Second, organize the folders in ./benchmark as follows:

benchmark/
  dfd/
    imgs/
    dfd.json
  dfdc/
  dpf/
  ma/
  mffdi/
  pgc/
  wfir/

Third, run eval scripts:

Evaluate LLaVA only

CUDA_VISIBLE_DEVICES=0 python eval_llava.py --benchmark BENCHMARK_NAME --model_name llava-mistral-7b

Evaluate FFAA
```
CUDA_VISIBLE_DEVICES=0 python eval.py --benchmark BENCHMARK_NAME --model mistral --generate_num 3 --eval_num -1
```
- BENCHMARK_NAME: dfd, dfdc, dpf, ma, mffdi, pgc, wfir.
- generate_num: the number of generated answers for each input image.
- eval_num: the number of images which will be evaluated. -1 represents all.

Last, the results will be saved in ./results/.

Inference

You can place the test images in ./playground/test_images. The prompts are shown in prompts.txt.

Then you can change the image path which you want to test in inference.py and run as follows:

CUDA_VISIBLE_DEVICES=0 python inference.py --crop 1 --visualize 1

crop: 1 means the face in the image will be automatically cropped.
visualize: 1 means the heatmaps of MIDS will be visulized and saved in ./heatmaps/.

Citation

If you find FFAA useful for your research and applications, please cite using this BibTeX:

@article{huang2024ffaa,
         title={FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant},
         author={Huang, Zhengchao and Xia, Bin and Lin, Zicheng and Mou, Zhun and Yang, Wenming},
         journal={arXiv preprint arXiv:2408.10072},
         year={2024}
}

thu-huangzc / FFAA

readme