Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: A Cyclical Post-filtering Approach to Mismatch Refinement of Neural
Vocoder for Text-to-speech Systems
summary: Recently, the effectiveness of text-to-speech (TTS) systems combined with
neural vocoders to generate high-fidelity speech has been shown. However,
collecting the required training data and building these advanced systems from
scratch is time and resource consuming. A more economical approach is to
develop a neural vocoder to enhance the speech generated by existing TTS
systems. Nonetheless, this approach usually suffers from two issues: 1)
temporal mismatches between TTS and natural waveforms and 2) acoustic
mismatches between training and testing data. To address these issues, we adopt
a cyclic voice conversion (VC) model to generate temporally matched pseudo-VC
data for training and acoustically matched enhanced data for testing the neural
vocoders. Because of the generality, this framework can be applied to arbitrary
neural vocoders. In this paper, we apply the proposed method with a
state-of-the-art WaveNet vocoder for two different TTS systems, and both
objective and subjective experimental results confirm the effectiveness of the
proposed framework.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems
summary: Recently, the effectiveness of text-to-speech (TTS) systems combined with neural vocoders to generate high-fidelity speech has been shown. However, collecting the required training data and building these advanced systems from scratch is time and resource consuming. A more economical approach is to develop a neural vocoder to enhance the speech generated by existing TTS systems. Nonetheless, this approach usually suffers from two issues: 1) temporal mismatches between TTS and natural waveforms and 2) acoustic mismatches between training and testing data. To address these issues, we adopt a cyclic voice conversion (VC) model to generate temporally matched pseudo-VC data for training and acoustically matched enhanced data for testing the neural vocoders. Because of the generality, this framework can be applied to arbitrary neural vocoders. In this paper, we apply the proposed method with a state-of-the-art WaveNet vocoder for two different TTS systems, and both objective and subjective experimental results confirm the effectiveness of the proposed framework.
id: http://arxiv.org/abs/2005.08659v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.