Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Non-Parallel Voice Conversion with Augmented Classifier Star Generative
Adversarial Networks
summary: This paper proposes a method that allows for non-parallel multi-domain voice
conversion (VC) by using a variant of generative adversarial networks (GANs)
called StarGAN. The main features of our method, which we term StarGAN-VC, are
as follows: First, it requires no parallel utterances, transcriptions, or time
alignment procedures for speech generator training. Second, it can
simultaneously learn mappings across multiple domains using a single generator
network so that it can fully use available training data collected from
multiple domains by capturing common latent features that can be shared across
different domains. Third, it is able to generate converted speech signals
quickly enough to allow real-time implementations and requires only several
minutes of training examples to generate reasonably realistic-sounding speech.
In this paper, we describe three formulations of StarGAN, including a newly
introduced novel StarGAN variant called "Augmented classifier StarGAN
(A-StarGAN)", and compare them in a non-parallel VC task. We also compare them
with several baseline methods.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Non-Parallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks
summary: This paper proposes a method that allows for non-parallel multi-domain voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN. The main features of our method, which we term StarGAN-VC, are as follows: First, it requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training. Second, it can simultaneously learn mappings across multiple domains using a single generator network so that it can fully use available training data collected from multiple domains by capturing common latent features that can be shared across different domains. Third, it is able to generate converted speech signals quickly enough to allow real-time implementations and requires only several minutes of training examples to generate reasonably realistic-sounding speech. In this paper, we describe three formulations of StarGAN, including a newly introduced novel StarGAN variant called "Augmented classifier StarGAN (A-StarGAN)", and compare them in a non-parallel VC task. We also compare them with several baseline methods.
id: http://arxiv.org/abs/2008.12604v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.