opendatalab / LOKI

The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
115 stars 1 forks source link

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Paper PDF Website Website

LOKI

🎉 News

🔥 Takeaways

Diverse modalities: Our dataset includes high-quality multimodal data generated by recent popular synthetic models, covering $\color{#ffb60dde}{\textbf{video}}$, $\color{rgba(83, 164, 251, 1)}{\textbf{image}}$, $\color{rgba(41, 208, 108, 1)}{\textbf{3D}}$, $\color{rgb(166, 72, 255)}{\textbf{text}}$, $\color{rgb(255, 58, 58)}{\textbf{audio}}$.
Heterogeneous category: Our collected dataset includes 26 detailed categories across different modalities, such as specialized statellite and medical images; texts like philosophu and ancient chinese; and $\color{rgb(255, 58, 58)}{\textbf{audio}}$ data like singing voices, enviromental sound and music.
Multi-level tasks: LOKI includes basic ”Synthetic or Real” labels, suitable for fundamental question settings like true/false and multiple-choice questions. It also incorporates fine-grained anomalies for inferential explanations, enabling tasks like abnormal detail selection and abnormal explanation, to explore LMMs’ capabilities in explainable synthetic data detection.
Multimodal synthetic data evaluation framework: We propose a comprehensive evaluation framework that supports inputs of various data formats and over 25 mainstream multimodal models.

📚 Contents

🛠️ Installation

Please clone our repository and change to that folder

git clone https://github.com/opendatalab/LOKI.git
cd LOKI

Change to the dev branch, create a new python environment and install relevant requirements

git checkout dev
conda create -n loki python=3.10
conda activate loki
pip install -e .

📦 Data Preparation

LOKI contains media data across 5 modalities: $\color{#ffb60dde}{\textbf{video}}$, $\color{rgba(83, 164, 251, 1)}{\textbf{image}}$, $\color{rgba(41, 208, 108, 1)}{\textbf{3D}}$, $\color{rgb(166, 72, 255)}{\textbf{text}}$, $\color{rgb(255, 58, 58)}{\textbf{audio}}$.

To examine the performance of LMMs on each modality, you need to first download the data from 🤗 huggingface .

Then, unzip the dataset and put it under the current folder.

Your media_data folder should look like:

├── 3D
│   
├── image
│   
├── video

🤖 Model Preparation

Our evaluation framework supports over 20+ mainstream foundation models. Please see here for full model list.

Most of our models can be run off-the-shelf with our framework, for models that require special environment setup, we refer readers to here for more information.

📊 Evaluation

Now, start evaluating!

The configs folder contains configurations for the models and LOKI tasks, which are then read and used by run.py

For example, to evaluate Phi-3.5-Vision model on the LOKI's image judgement task, your command should be:

accelerate launch  --num_processes=4 --main_process_port=12005 run.py --model_config_path configs/models/phi_3.5_vision_config.yaml --task_config_path configs/tasks/image/image_tf_loki.yaml --batch_size 1 

😄 Acknowledgement

Some of the design philosophy of our framework is adopted from lmms-eval.

📜 Citations

@article{ye2024loki,
  title={LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models},
  author={Ye, Junyan and Zhou, Baichuan and Huang, Zilong and Zhang, Junan and Bai, Tianyi and Kang, Hengrui and He, Jun and Lin, Honglin and Wang, Zihao and Wu, Tong and others},
  journal={arXiv preprint arXiv:2410.09732},
  year={2024}
}