isjwdu / DFADD

Official Implementation and Dataset of paper - DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset
MIT License
11 stars 0 forks source link

DFADD: The Diffusion and Flow-matching based Audio Deepfake Dataset

Paper, Dataset, Demo, Homepage

SLT 2024

Key Features:

  1. DFADD is the first dataset that includes spoofed speech generated specifically using diffusion and Flow-matching based TTS models.

  2. Compared to anti-spoofing models trained on the ASVspoof, models trained on DFADD exhibit better Equal Error Rates (EERs) when confronted with spoofed speech generated using the same methods.

Dataset Download

  1. HuggingFace dataset
from datasets import load_dataset
DFADD = load_dataset('isjwdu/DFADD')
  1. ZIP files

Acknowledgement

DFADD is created based on several official and unofficial open-source implementations and datasets:

VCTK dataset, licensed under CC-BY-4.0.

LJ Speech dataset, licensed under Public Domain.

PFlow-TTS (Unofficial), https://github.com/p0p4k/pflowtts_pytorch.

NaturalSpeech2 (Unofficial), https://github.com/CODEJIN/NaturalSpeech2.

Grad-TTS (Official), https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS.

Style-TTS2 (Official), https://github.com/yl4579/StyleTTS2.

Matcha-TTS (Official), https://github.com/shivammehta25/Matcha-TTS.

Citation

Please consider citing our paper if this work helps your research. Thank you!

@misc{du2024dfadddiffusionflowmatchingbased,
      title={DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset}, 
      author={Jiawei Du and I-Ming Lin and I-Hsiang Chiu and Xuanjun Chen and Haibin Wu and Wenze Ren and Yu Tsao and Hung-yi Lee and Jyh-Shing Roger Jang},
      year={2024},
      eprint={2409.08731},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2409.08731}, 
}