summary: A phonetic posteriorgram (PPG) is a time-varying categorical distribution
over acoustic units of speech (e.g., phonemes). PPGs are a popular
representation in speech generation due to their ability to disentangle
pronunciation features from speaker identity, allowing accurate reconstruction
of pronunciation (e.g., voice conversion) and coarse-grained pronunciation
editing (e.g., foreign accent conversion). In this paper, we demonstrably
improve the quality of PPGs to produce a state-of-the-art interpretable PPG
representation. We train an off-the-shelf speech synthesizer using our PPG
representation and show that high-quality PPGs yield independent control over
pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an
acoustic pronunciation distance and fine-grained pronunciation control.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: High-Fidelity Neural Phonetic Posteriorgrams
summary: A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.
id: http://arxiv.org/abs/2402.17735v1
judge
Write [vclab::confirmed] or [vclab::excluded] in comment.