The eSpeak text-to-speech program uses a combination of klatt (see issue #35) parameters, recorded wave audio and spectral parameters. This is all coordinated by generating a sequence of wave commands from the voice data.
A system like the wave command processing would be good as it will allow Cainteoir Engine to support multiple synthesis techniques.
The wave file processing here looks different to diphone wave concatenation, but I am not savvy on the details to say for certain.
The spectral synthesis algorithm is similar to the way that klatt works, but uses a different mathematical model to produce the wave forms.
The klatt synthesis algorithm is just an interface from the wave commands to the Klatt synthesizer (issue #35) which has been modified to better integrate with eSpeak.
There is also support for MBROLA voices in the wave commands that passes the pho file phonemes and prosody data to an mbrola process that does the synthesis. This does not make sense for the design of the Cainteoir Engine as it should be done at a higher level -- that is, it should be a separate synthesizer.
---
Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1026786-implement-the-espeak-voice-synthesizer?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github).
The eSpeak text-to-speech program uses a combination of klatt (see issue #35) parameters, recorded wave audio and spectral parameters. This is all coordinated by generating a sequence of wave commands from the voice data.
A system like the wave command processing would be good as it will allow Cainteoir Engine to support multiple synthesis techniques.
The wave file processing here looks different to diphone wave concatenation, but I am not savvy on the details to say for certain.
The spectral synthesis algorithm is similar to the way that klatt works, but uses a different mathematical model to produce the wave forms.
The klatt synthesis algorithm is just an interface from the wave commands to the Klatt synthesizer (issue #35) which has been modified to better integrate with eSpeak.
There is also support for MBROLA voices in the wave commands that passes the pho file phonemes and prosody data to an mbrola process that does the synthesis. This does not make sense for the design of the Cainteoir Engine as it should be done at a higher level -- that is, it should be a separate synthesizer.