Paper, Dataset, Demo, Homepage
SLT 2024
Key Features:
DFADD is the first dataset that includes spoofed speech generated specifically using diffusion and Flow-matching based TTS models.
Compared to anti-spoofing models trained on the ASVspoof, models trained on DFADD exhibit better Equal Error Rates (EERs) when confronted with spoofed speech generated using the same methods.
from datasets import load_dataset
DFADD = load_dataset('isjwdu/DFADD')
DFADD is created based on several official and unofficial open-source implementations and datasets:
VCTK dataset, licensed under CC-BY-4.0.
LJ Speech dataset, licensed under Public Domain.
PFlow-TTS (Unofficial), https://github.com/p0p4k/pflowtts_pytorch.
NaturalSpeech2 (Unofficial), https://github.com/CODEJIN/NaturalSpeech2.
Grad-TTS (Official), https://github.com/huawei-noah/Speech-Backbones/tree/main/Grad-TTS.
Style-TTS2 (Official), https://github.com/yl4579/StyleTTS2.
Matcha-TTS (Official), https://github.com/shivammehta25/Matcha-TTS.
Please consider citing our paper if this work helps your research. Thank you!
@misc{du2024dfadddiffusionflowmatchingbased,
title={DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset},
author={Jiawei Du and I-Ming Lin and I-Hsiang Chiu and Xuanjun Chen and Haibin Wu and Wenze Ren and Yu Tsao and Hung-yi Lee and Jyh-Shing Roger Jang},
year={2024},
eprint={2409.08731},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2409.08731},
}