rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

Implement the eSpeak voice synthesizer #36

Open rhdunn opened 11 years ago

rhdunn commented 11 years ago

The eSpeak text-to-speech program uses a combination of klatt (see issue #35) parameters, recorded wave audio and spectral parameters. This is all coordinated by generating a sequence of wave commands from the voice data.

A system like the wave command processing would be good as it will allow Cainteoir Engine to support multiple synthesis techniques.

The wave file processing here looks different to diphone wave concatenation, but I am not savvy on the details to say for certain.

The spectral synthesis algorithm is similar to the way that klatt works, but uses a different mathematical model to produce the wave forms.

The klatt synthesis algorithm is just an interface from the wave commands to the Klatt synthesizer (issue #35) which has been modified to better integrate with eSpeak.

There is also support for MBROLA voices in the wave commands that passes the pho file phonemes and prosody data to an mbrola process that does the synthesis. This does not make sense for the design of the Cainteoir Engine as it should be done at a higher level -- that is, it should be a separate synthesizer.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1026786-implement-the-espeak-voice-synthesizer?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github).
rhdunn commented 11 years ago

Depends on #38 (Phoneme Model)