Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale
Modification of Speech Signals
summary: Time- and pitch-scale modifications of speech signals find important
applications in speech synthesis, playback systems, voice conversion,
learning/hearing aids, etc.. There is a requirement for computationally
efficient and real-time implementable algorithms. In this paper, we propose a
high quality and computationally efficient time- and pitch-scaling methodology
based on the glottal closure instants (GCIs) or epochs in speech signals. The
proposed algorithm, termed as epoch-synchronous overlap-add time/pitch-scaling
(ESOLA-TS/PS), segments speech signals into overlapping short-time frames and
then the adjacent frames are aligned with respect to the epochs and the frames
are overlap-added to synthesize time-scale modified speech. Pitch scaling is
achieved by resampling the time-scaled speech by a desired sampling factor. We
also propose a concept of epoch embedding into speech signals, which
facilitates the identification and time-stamping of samples corresponding to
epochs and using them for time/pitch-scaling to multiple scaling factors
whenever desired, thereby contributing to faster and efficient implementation.
The results of perceptual evaluation tests reported in this paper indicate the
superiority of ESOLA over state-of-the-art techniques. ESOLA significantly
outperforms the conventional pitch synchronous overlap-add (PSOLA) techniques
in terms of perceptual quality and intelligibility of the modified speech.
Unlike the waveform similarity overlap-add (WSOLA) or synchronous overlap-add
(SOLA) techniques, the ESOLA technique has the capability to do exact
time-scaling of speech with high quality to any desired modification factor
within a range of 0.5 to 2. Compared to synchronous overlap-add with fixed
synthesis (SOLAFS), the ESOLA is computationally advantageous and at least
three times faster.
Thunk you very much for contribution!
Your judgement is refrected in arXivSearches.json, and is going to be used for VCLab's activity.
Thunk you so much.
Please check whether this paper is about 'Voice Conversion' or not.
article info.
title: Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals
summary: Time- and pitch-scale modifications of speech signals find important applications in speech synthesis, playback systems, voice conversion, learning/hearing aids, etc.. There is a requirement for computationally efficient and real-time implementable algorithms. In this paper, we propose a high quality and computationally efficient time- and pitch-scaling methodology based on the glottal closure instants (GCIs) or epochs in speech signals. The proposed algorithm, termed as epoch-synchronous overlap-add time/pitch-scaling (ESOLA-TS/PS), segments speech signals into overlapping short-time frames and then the adjacent frames are aligned with respect to the epochs and the frames are overlap-added to synthesize time-scale modified speech. Pitch scaling is achieved by resampling the time-scaled speech by a desired sampling factor. We also propose a concept of epoch embedding into speech signals, which facilitates the identification and time-stamping of samples corresponding to epochs and using them for time/pitch-scaling to multiple scaling factors whenever desired, thereby contributing to faster and efficient implementation. The results of perceptual evaluation tests reported in this paper indicate the superiority of ESOLA over state-of-the-art techniques. ESOLA significantly outperforms the conventional pitch synchronous overlap-add (PSOLA) techniques in terms of perceptual quality and intelligibility of the modified speech. Unlike the waveform similarity overlap-add (WSOLA) or synchronous overlap-add (SOLA) techniques, the ESOLA technique has the capability to do exact time-scaling of speech with high quality to any desired modification factor within a range of 0.5 to 2. Compared to synchronous overlap-add with fixed synthesis (SOLAFS), the ESOLA is computationally advantageous and at least three times faster.
id: http://arxiv.org/abs/1801.06492v1
judge
Write 'confirmed' or 'excluded' in [] as comment.