Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: An RFP dataset for Real, Fake, and Partially fake audio detection
summary: Recent advances in deep learning have enabled the creation of
natural-sounding synthesised speech. However, attackers have also utilised
these tech-nologies to conduct attacks such as phishing. Numerous public
datasets have been created to facilitate the development of effective detection
models. How-ever, available datasets contain only entirely fake audio;
therefore, detection models may miss attacks that replace a short section of
the real audio with fake audio. In recognition of this problem, the current
paper presents the RFP da-taset, which comprises five distinct audio types:
partial fake (PF), audio with noise, voice conversion (VC), text-to-speech
(TTS), and real. The data are then used to evaluate several detection models,
revealing that the available detec-tion models incur a markedly higher equal
error rate (EER) when detecting PF audio instead of entirely fake audio. The
lowest EER recorded was 25.42%. Therefore, we believe that creators of
detection models must seriously consid-er using datasets like RFP that include
PF and other types of fake audio.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: An RFP dataset for Real, Fake, and Partially fake audio detection
summary: Recent advances in deep learning have enabled the creation of natural-sounding synthesised speech. However, attackers have also utilised these tech-nologies to conduct attacks such as phishing. Numerous public datasets have been created to facilitate the development of effective detection models. How-ever, available datasets contain only entirely fake audio; therefore, detection models may miss attacks that replace a short section of the real audio with fake audio. In recognition of this problem, the current paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real. The data are then used to evaluate several detection models, revealing that the available detec-tion models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio. The lowest EER recorded was 25.42%. Therefore, we believe that creators of detection models must seriously consid-er using datasets like RFP that include PF and other types of fake audio.
id: http://arxiv.org/abs/2404.17721v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.