Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Robust One-Shot Singing Voice Conversion
summary: Many existing works on singing voice conversion (SVC) require clean
recordings of target singer's voice for training. However, it is often
difficult to collect them in advance and singing voices are often distorted
with reverb and accompaniment music. In this work, we propose robust one-shot
SVC (ROSVC) that performs any-to-any SVC robustly even on such distorted
singing voices using less than 10s of a reference voice. To this end, we
propose two-stage training method called Robustify. In the first stage, a novel
one-shot SVC model based on a generative adversarial network is trained on
clean data to ensure high-quality conversion. In the second stage, enhancement
modules are introduced to the encoders of the model to improve the robustness
against distortions in the feature space. Experimental results show that the
proposed method outperforms one-shot SVC baselines for both seen and unseen
singers and greatly improves the robustness against the distortions.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Robust One-Shot Singing Voice Conversion
summary: Many existing works on singing voice conversion (SVC) require clean recordings of target singer's voice for training. However, it is often difficult to collect them in advance and singing voices are often distorted with reverb and accompaniment music. In this work, we propose robust one-shot SVC (ROSVC) that performs any-to-any SVC robustly even on such distorted singing voices using less than 10s of a reference voice. To this end, we propose two-stage training method called Robustify. In the first stage, a novel one-shot SVC model based on a generative adversarial network is trained on clean data to ensure high-quality conversion. In the second stage, enhancement modules are introduced to the encoders of the model to improve the robustness against distortions in the feature space. Experimental results show that the proposed method outperforms one-shot SVC baselines for both seen and unseen singers and greatly improves the robustness against the distortions.
id: http://arxiv.org/abs/2210.11096v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.