Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: DurIAN-SC: Duration Informed Attention Network based Singing Voice
Conversion System
summary: Singing voice conversion is converting the timbre in the source singing to
the target speaker's voice while keeping singing content the same. However,
singing data for target speaker is much more difficult to collect compared with
normal speech data.In this paper, we introduce a singing voice conversion
algorithm that is capable of generating high quality target speaker's singing
using only his/her normal speech data. First, we manage to integrate the
training and conversion process of speech and singing into one framework by
unifying the features used in standard speech synthesis system and singing
synthesis system. In this way, normal speech data can also contribute to
singing voice conversion training, making the singing voice conversion system
more robust especially when the singing database is small.Moreover, in order to
achieve one-shot singing voice conversion, a speaker embedding module is
developed using both speech and singing data, which provides target speaker
identify information during conversion. Experiments indicate proposed sing
conversion system can convert source singing to target speaker's high-quality
singing with only 20 seconds of target speaker's enrollment speech data.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System
summary: Singing voice conversion is converting the timbre in the source singing to the target speaker's voice while keeping singing content the same. However, singing data for target speaker is much more difficult to collect compared with normal speech data.In this paper, we introduce a singing voice conversion algorithm that is capable of generating high quality target speaker's singing using only his/her normal speech data. First, we manage to integrate the training and conversion process of speech and singing into one framework by unifying the features used in standard speech synthesis system and singing synthesis system. In this way, normal speech data can also contribute to singing voice conversion training, making the singing voice conversion system more robust especially when the singing database is small.Moreover, in order to achieve one-shot singing voice conversion, a speaker embedding module is developed using both speech and singing data, which provides target speaker identify information during conversion. Experiments indicate proposed sing conversion system can convert source singing to target speaker's high-quality singing with only 20 seconds of target speaker's enrollment speech data.
id: http://arxiv.org/abs/2008.03009v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.