summary: The voice conversion challenge is a bi-annual scientific event held to
compare and understand different voice conversion (VC) systems built on a
common dataset. In 2020, we organized the third edition of the challenge and
constructed and distributed a new database for two tasks, intra-lingual
semi-parallel and cross-lingual VC. After a two-month challenge period, we
received 33 submissions, including 3 baselines built on the database. From the
results of crowd-sourced listening tests, we observed that VC methods have
progressed rapidly thanks to advanced deep learning methods. In particular,
speaker similarity scores of several systems turned out to be as high as target
speakers in the intra-lingual semi-parallel VC task. However, we confirmed that
none of them have achieved human-level naturalness yet for the same task. The
cross-lingual conversion task is, as expected, a more difficult task, and the
overall naturalness and similarity scores were lower than those for the
intra-lingual conversion task. However, we observed encouraging results, and
the MOS scores of the best systems were higher than 4.0. We also show a few
additional analysis results to aid in understanding cross-lingual VC better.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
summary: The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed a new database for two tasks, intra-lingual semi-parallel and cross-lingual VC. After a two-month challenge period, we received 33 submissions, including 3 baselines built on the database. From the results of crowd-sourced listening tests, we observed that VC methods have progressed rapidly thanks to advanced deep learning methods. In particular, speaker similarity scores of several systems turned out to be as high as target speakers in the intra-lingual semi-parallel VC task. However, we confirmed that none of them have achieved human-level naturalness yet for the same task. The cross-lingual conversion task is, as expected, a more difficult task, and the overall naturalness and similarity scores were lower than those for the intra-lingual conversion task. However, we observed encouraging results, and the MOS scores of the best systems were higher than 4.0. We also show a few additional analysis results to aid in understanding cross-lingual VC better.
id: http://arxiv.org/abs/2008.12527v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.