Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
Natural-Sounding Voice Conversion
summary: We present an unsupervised non-parallel many-to-many voice conversion (VC)
method using a generative adversarial network (GAN) called StarGAN v2. Using a
combination of adversarial source classifier loss and perceptual loss, our
model significantly outperforms previous VC models. Although our model is
trained only with 20 English speakers, it generalizes to a variety of voice
conversion tasks, such as any-to-many, cross-lingual, and singing conversion.
Using a style encoder, our framework can also convert plain reading speech into
stylistic speech, such as emotional and falsetto speech. Subjective and
objective evaluation experiments on a non-parallel many-to-many voice
conversion task revealed that our model produces natural sounding voices, close
to the sound quality of state-of-the-art text-to-speech (TTS) based voice
conversion methods without the need for text labels. Moreover, our model is
completely convolutional and with a faster-than-real-time vocoder such as
Parallel WaveGAN can perform real-time voice conversion.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
summary: We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2. Using a combination of adversarial source classifier loss and perceptual loss, our model significantly outperforms previous VC models. Although our model is trained only with 20 English speakers, it generalizes to a variety of voice conversion tasks, such as any-to-many, cross-lingual, and singing conversion. Using a style encoder, our framework can also convert plain reading speech into stylistic speech, such as emotional and falsetto speech. Subjective and objective evaluation experiments on a non-parallel many-to-many voice conversion task revealed that our model produces natural sounding voices, close to the sound quality of state-of-the-art text-to-speech (TTS) based voice conversion methods without the need for text labels. Moreover, our model is completely convolutional and with a faster-than-real-time vocoder such as Parallel WaveGAN can perform real-time voice conversion.
id: http://arxiv.org/abs/2107.10394v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.