• Diverse modalities: Our dataset includes high-quality multimodal data generated by recent popular synthetic models, covering
$\color{#ffb60dde}{\textbf{video}}$,
$\color{rgba(83, 164, 251, 1)}{\textbf{image}}$,
$\color{rgba(41, 208, 108, 1)}{\textbf{3D}}$,
$\color{rgb(166, 72, 255)}{\textbf{text}}$,
$\color{rgb(255, 58, 58)}{\textbf{audio}}$.
• Heterogeneous category: Our collected dataset includes 26 detailed categories across different modalities, such as specialized statellite and
medical images; texts like philosophu and
ancient chinese; and $\color{rgb(255, 58, 58)}{\textbf{audio}}$ data like
singing voices, enviromental sound and music.
• Multi-level tasks: LOKI includes basic ”Synthetic or Real” labels, suitable for fundamental question settings like true/false and multiple-choice questions. It also incorporates fine-grained
anomalies for inferential explanations, enabling tasks like abnormal detail selection and abnormal
explanation, to explore LMMs’ capabilities in explainable synthetic data detection.
• Multimodal synthetic data evaluation framework: We propose a comprehensive evaluation framework
that supports inputs of various data formats and over 25 mainstream multimodal models.
Please clone our repository and change to that folder
git clone https://github.com/opendatalab/LOKI.git
cd LOKI
Change to the dev branch, create a new python environment and install relevant requirements
git checkout dev
conda create -n loki python=3.10
conda activate loki
pip install -e .
LOKI contains media data across 5 modalities: $\color{#ffb60dde}{\textbf{video}}$, $\color{rgba(83, 164, 251, 1)}{\textbf{image}}$, $\color{rgba(41, 208, 108, 1)}{\textbf{3D}}$, $\color{rgb(166, 72, 255)}{\textbf{text}}$, $\color{rgb(255, 58, 58)}{\textbf{audio}}$.
To examine the performance of LMMs on each modality, you need to first download the data from 🤗 huggingface .
Then, unzip the dataset and put it under the current folder.
Your media_data folder should look like:
├── 3D
│
├── image
│
├── video
Our evaluation framework supports over 20+ mainstream foundation models. Please see here for full model list.
Most of our models can be run off-the-shelf with our framework, for models that require special environment setup, we refer readers to here for more information.
Now, start evaluating!
The configs
folder contains configurations for the models and LOKI tasks, which are then read and used by run.py
For example, to evaluate Phi-3.5-Vision model on the LOKI's image judgement task, your command should be:
accelerate launch --num_processes=4 --main_process_port=12005 run.py --model_config_path configs/models/phi_3.5_vision_config.yaml --task_config_path configs/tasks/image/image_tf_loki.yaml --batch_size 1
Some of the design philosophy of our framework is adopted from lmms-eval.
@article{ye2024loki,
title={LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models},
author={Ye, Junyan and Zhou, Baichuan and Huang, Zilong and Zhang, Junan and Bai, Tianyi and Kang, Hengrui and He, Jun and Lin, Honglin and Wang, Zihao and Wu, Tong and others},
journal={arXiv preprint arXiv:2410.09732},
year={2024}
}