Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram
Conversion
summary: Non-parallel voice conversion (VC) is a technique for learning mappings
between source and target speeches without using a parallel corpus. Recently,
cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown
promising results regarding this problem and have been widely used as benchmark
methods. However, owing to the ambiguity of the effectiveness of
CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for
mel-cepstrum conversion even when comparative methods employ mel-spectrogram as
a conversion target. To address this, we examined the applicability of
CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we
discovered that their direct applications compromised the time-frequency
structure that should be preserved during conversion. To remedy this, we
propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates
time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the
scale and bias of the converted features while reflecting the time-frequency
structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on
inter-gender and intra-gender non-parallel VC. A subjective evaluation of
naturalness and similarity showed that for every VC pair, CycleGAN-VC3
outperforms or is competitive with the two types of CycleGAN-VC2, one of which
was applied to mel-cepstrum and the other to mel-spectrogram. Audio samples are
available at
http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/index.html.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion
summary: Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram. Audio samples are available at http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/index.html.
id: http://arxiv.org/abs/2010.11672v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.