Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: NoiseVC: Towards High Quality Zero-Shot Voice Conversion
summary: Voice conversion (VC) is a task that transforms voice from target audio to
source without losing linguistic contents, it is challenging especially when
source and target speakers are unseen during training (zero-shot VC). Previous
approaches require a pre-trained model or linguistic data to do the zero-shot
conversion. Meanwhile, VC models with Vector Quantization (VQ) or Instance
Normalization (IN) are able to disentangle contents from audios and achieve
successful conversions. However, disentanglement in these models highly relies
on heavily constrained bottleneck layers, thus, the sound quality is
drastically sacrificed. In this paper, we propose NoiseVC, an approach that can
disentangle contents based on VQ and Contrastive Predictive Coding (CPC).
Additionally, Noise Augmentation is performed to further enhance
disentanglement capability. We conduct several experiments and demonstrate that
NoiseVC has a strong disentanglement ability with a small sacrifice of quality.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: NoiseVC: Towards High Quality Zero-Shot Voice Conversion
summary: Voice conversion (VC) is a task that transforms voice from target audio to source without losing linguistic contents, it is challenging especially when source and target speakers are unseen during training (zero-shot VC). Previous approaches require a pre-trained model or linguistic data to do the zero-shot conversion. Meanwhile, VC models with Vector Quantization (VQ) or Instance Normalization (IN) are able to disentangle contents from audios and achieve successful conversions. However, disentanglement in these models highly relies on heavily constrained bottleneck layers, thus, the sound quality is drastically sacrificed. In this paper, we propose NoiseVC, an approach that can disentangle contents based on VQ and Contrastive Predictive Coding (CPC). Additionally, Noise Augmentation is performed to further enhance disentanglement capability. We conduct several experiments and demonstrate that NoiseVC has a strong disentanglement ability with a small sacrifice of quality.
id: http://arxiv.org/abs/2104.06074v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.