Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery
summary: Discovering speaker independent acoustic units purely from spoken input is
known to be a hard problem. In this work we propose an unsupervised speaker
normalization technique prior to unit discovery. It is based on separating
speaker related from content induced variations in a speech signal with an
adversarial contrastive predictive coding approach. This technique does neither
require transcribed speech nor speaker labels, and, furthermore, can be trained
in a multilingual fashion, thus achieving speaker normalization even if only
few unlabeled data is available from the target language. The speaker
normalization is done by mapping all utterances to a medoid style which is
representative for the whole database. We demonstrate the effectiveness of the
approach by conducting acoustic unit discovery with a hidden Markov model
variational autoencoder noting, however, that the proposed speaker
normalization can serve as a front end to any unit discovery system.
Experiments on English, Yoruba and Mboshi show improvements compared to using
non-normalized input.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery
summary: Discovering speaker independent acoustic units purely from spoken input is known to be a hard problem. In this work we propose an unsupervised speaker normalization technique prior to unit discovery. It is based on separating speaker related from content induced variations in a speech signal with an adversarial contrastive predictive coding approach. This technique does neither require transcribed speech nor speaker labels, and, furthermore, can be trained in a multilingual fashion, thus achieving speaker normalization even if only few unlabeled data is available from the target language. The speaker normalization is done by mapping all utterances to a medoid style which is representative for the whole database. We demonstrate the effectiveness of the approach by conducting acoustic unit discovery with a hidden Markov model variational autoencoder noting, however, that the proposed speaker normalization can serve as a front end to any unit discovery system. Experiments on English, Yoruba and Mboshi show improvements compared to using non-normalized input.
id: http://arxiv.org/abs/2105.01786v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.