Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Light Convolutional Neural Network with Feature Genuinization for
Detection of Synthetic Speech Attacks
summary: Modern text-to-speech (TTS) and voice conversion (VC) systems produce natural
sounding speech that questions the security of automatic speaker verification
(ASV). This makes detection of such synthetic speech very important to
safeguard ASV systems from unauthorized access. Most of the existing spoofing
countermeasures perform well when the nature of the attacks is made known to
the system during training. However, their performance degrades in face of
unseen nature of attacks. In comparison to the synthetic speech created by a
wide range of TTS and VC methods, genuine speech has a more consistent
distribution. We believe that the difference between the distribution of
synthetic and genuine speech is an important discriminative feature between the
two classes. In this regard, we propose a novel method referred to as feature
genuinization that learns a transformer with convolutional neural network (CNN)
using the characteristics of only genuine speech. We then use this
genuinization transformer with a light CNN classifier. The ASVspoof 2019
logical access corpus is used to evaluate the proposed method. The studies show
that the proposed feature genuinization based LCNN system outperforms other
state-of-the-art spoofing countermeasures, depicting its effectiveness for
detection of synthetic speech attacks.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks
summary: Modern text-to-speech (TTS) and voice conversion (VC) systems produce natural sounding speech that questions the security of automatic speaker verification (ASV). This makes detection of such synthetic speech very important to safeguard ASV systems from unauthorized access. Most of the existing spoofing countermeasures perform well when the nature of the attacks is made known to the system during training. However, their performance degrades in face of unseen nature of attacks. In comparison to the synthetic speech created by a wide range of TTS and VC methods, genuine speech has a more consistent distribution. We believe that the difference between the distribution of synthetic and genuine speech is an important discriminative feature between the two classes. In this regard, we propose a novel method referred to as feature genuinization that learns a transformer with convolutional neural network (CNN) using the characteristics of only genuine speech. We then use this genuinization transformer with a light CNN classifier. The ASVspoof 2019 logical access corpus is used to evaluate the proposed method. The studies show that the proposed feature genuinization based LCNN system outperforms other state-of-the-art spoofing countermeasures, depicting its effectiveness for detection of synthetic speech attacks.
id: http://arxiv.org/abs/2009.09637v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.