Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited
Data
summary: This paper introduces Taco-VC, a novel architecture for voice conversion (VC)
based on the Tacotron synthesizer, which is a sequence-to-sequence with
attention model. The training of multi-speaker voice conversion systems
requires a large amount of resources, both in training and corpus size. Taco-VC
is implemented using a single speaker Tacotron synthesizer based on Phonetic
Posteriorgrams (PPGs) and a single speaker Wavenet vocoder conditioned on Mel
Spectrograms. To enhance the converted speech quality, the outputs of the
Tacotron are passed through a novel speech-enhancement network, which is
composed of a combination of phoneme recognition and Tacotron networks. Our
system is trained just with a mid-size, single speaker corpus, and adapted to
new speakers using only few minutes of training data. Using public mid-size
datasets, our method outperforms the baseline in the VCC 2018 SPOKE task, and
achieves competitive results compared to multi-speaker networks trained on
private large datasets.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data
summary: This paper introduces Taco-VC, a novel architecture for voice conversion (VC) based on the Tacotron synthesizer, which is a sequence-to-sequence with attention model. The training of multi-speaker voice conversion systems requires a large amount of resources, both in training and corpus size. Taco-VC is implemented using a single speaker Tacotron synthesizer based on Phonetic Posteriorgrams (PPGs) and a single speaker Wavenet vocoder conditioned on Mel Spectrograms. To enhance the converted speech quality, the outputs of the Tacotron are passed through a novel speech-enhancement network, which is composed of a combination of phoneme recognition and Tacotron networks. Our system is trained just with a mid-size, single speaker corpus, and adapted to new speakers using only few minutes of training data. Using public mid-size datasets, our method outperforms the baseline in the VCC 2018 SPOKE task, and achieves competitive results compared to multi-speaker networks trained on private large datasets.
id: http://arxiv.org/abs/1904.03522v3
judge
Write 'confirmed' or 'excluded' in [] as comment.