Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders
summary: Neural speech codec has recently gained widespread attention in generative
speech modeling domains, like voice conversion, text-to-speech synthesis, etc.
However, ensuring high-fidelity audio reconstruction of speech codecs under
high compression rates remains an open and challenging issue. In this paper, we
propose PromptCodec, a novel end-to-end neural speech codec model using
disentangled representation learning based feature-aware prompt encoders. By
incorporating additional feature representations from prompt encoders,
PromptCodec can distribute the speech information requiring processing and
enhance its capabilities. Moreover, a simple yet effective adaptive feature
weighted fusion approach is introduced to integrate features of different
encoders. Meanwhile, we propose a novel disentangled representation learning
strategy based on cosine distance to optimize PromptCodec's encoders to ensure
their efficiency, thereby further improving the performance of PromptCodec.
Experiments on LibriTTS demonstrate that our proposed PromptCodec consistently
outperforms state-of-the-art neural speech codec models under all different
bitrate conditions while achieving impressive performance with low bitrates.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders
summary: Neural speech codec has recently gained widespread attention in generative speech modeling domains, like voice conversion, text-to-speech synthesis, etc. However, ensuring high-fidelity audio reconstruction of speech codecs under high compression rates remains an open and challenging issue. In this paper, we propose PromptCodec, a novel end-to-end neural speech codec model using disentangled representation learning based feature-aware prompt encoders. By incorporating additional feature representations from prompt encoders, PromptCodec can distribute the speech information requiring processing and enhance its capabilities. Moreover, a simple yet effective adaptive feature weighted fusion approach is introduced to integrate features of different encoders. Meanwhile, we propose a novel disentangled representation learning strategy based on cosine distance to optimize PromptCodec's encoders to ensure their efficiency, thereby further improving the performance of PromptCodec. Experiments on LibriTTS demonstrate that our proposed PromptCodec consistently outperforms state-of-the-art neural speech codec models under all different bitrate conditions while achieving impressive performance with low bitrates.
id: http://arxiv.org/abs/2404.02702v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.