Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: V2S attack: building DNN-based voice conversion from automatic speaker
verification
summary: This paper presents a new voice impersonation attack using voice conversion
(VC). Enrolling personal voices for automatic speaker verification (ASV) offers
natural and flexible biometric authentication systems. Basically, the ASV
systems do not include the users' voice data. However, if the ASV system is
unexpectedly exposed and hacked by a malicious attacker, there is a risk that
the attacker will use VC techniques to reproduce the enrolled user's voices. We
name this the ``verification-to-synthesis (V2S) attack'' and propose VC
training with the ASV and pre-trained automatic speech recognition (ASR) models
and without the targeted speaker's voice data. The VC model reproduces the
targeted speaker's individuality by deceiving the ASV model and restores
phonetic property of an input voice by matching phonetic posteriorgrams
predicted by the ASR model. The experimental evaluation compares converted
voices between the proposed method that does not use the targeted speaker's
voice data and the standard VC that uses the data. The experimental results
demonstrate that the proposed method performs comparably to the existing VC
methods that trained using a very small amount of parallel voice data.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: V2S attack: building DNN-based voice conversion from automatic speaker verification
summary: This paper presents a new voice impersonation attack using voice conversion (VC). Enrolling personal voices for automatic speaker verification (ASV) offers natural and flexible biometric authentication systems. Basically, the ASV systems do not include the users' voice data. However, if the ASV system is unexpectedly exposed and hacked by a malicious attacker, there is a risk that the attacker will use VC techniques to reproduce the enrolled user's voices. We name this the ``verification-to-synthesis (V2S) attack'' and propose VC training with the ASV and pre-trained automatic speech recognition (ASR) models and without the targeted speaker's voice data. The VC model reproduces the targeted speaker's individuality by deceiving the ASV model and restores phonetic property of an input voice by matching phonetic posteriorgrams predicted by the ASR model. The experimental evaluation compares converted voices between the proposed method that does not use the targeted speaker's voice data and the standard VC that uses the data. The experimental results demonstrate that the proposed method performs comparably to the existing VC methods that trained using a very small amount of parallel voice data.
id: http://arxiv.org/abs/1908.01454v1
judge
Write 'confirmed' or 'excluded' in [] as comment.