Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Universal Adaptor: Converting Mel-Spectrograms Between Different
Configurations for Speech Synthesis
summary: Most recent speech synthesis systems are composed of a synthesizer and a
vocoder. However, the existing synthesizers and vocoders can only be matched to
acoustic features extracted with a specific configuration. Hence, we can't
combine arbitrary synthesizers and vocoders together to form a complete system,
not to mention apply to a newly developed model. In this paper, we proposed
Universal Adaptor, which takes a Mel-spectrogram parametrized by the source
configuration and converts it into a Mel-spectrogram parametrized by the target
configuration, as long as we feed in the source and the target configurations.
Experiments show that the quality of speeches synthesized from our output of
Universal Adaptor is comparable to those synthesized from ground truth
Mel-spectrogram no matter in single-speaker or multi-speaker scenarios.
Moreover, Universal Adaptor can be applied in the recent TTS systems and voice
conversion systems without dropping quality.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
summary: Most recent speech synthesis systems are composed of a synthesizer and a vocoder. However, the existing synthesizers and vocoders can only be matched to acoustic features extracted with a specific configuration. Hence, we can't combine arbitrary synthesizers and vocoders together to form a complete system, not to mention apply to a newly developed model. In this paper, we proposed Universal Adaptor, which takes a Mel-spectrogram parametrized by the source configuration and converts it into a Mel-spectrogram parametrized by the target configuration, as long as we feed in the source and the target configurations. Experiments show that the quality of speeches synthesized from our output of Universal Adaptor is comparable to those synthesized from ground truth Mel-spectrogram no matter in single-speaker or multi-speaker scenarios. Moreover, Universal Adaptor can be applied in the recent TTS systems and voice conversion systems without dropping quality.
id: http://arxiv.org/abs/2204.00170v2
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.