rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

support espeak language (dictionary) and voice (phoneme) data #14

Closed rhdunn closed 11 years ago

rhdunn commented 12 years ago

At the moment, the cainteoir engine links to the external espeak library. This has various problems: 1/ there is code in espeak for handling ssml and html tags, but cainteoir processes these at a different level before passing the text to espeak; 2/ control over where sentence breaks (pauses) and word breaks are is poor -- espeak gets passed text blocks as found, so "1st" gets spoken incorrectly as espeak does not recognise this as "first"; 3/ lack of control over the dictionary makes it difficult to support adding words or reloading a dictionary while the application is running -- this also makes dictionary verifiers hard to implement; 4/ lack of fine grained control over prosody vs pronunciation and being able to separate different phases of espeak makes it difficult to control at this level; 5/ lack of buffer support for returning translation data makes it difficult to embed this functionality in applications (the API uses FILE* redirect).

rhdunn commented 11 years ago

The espeak implementation has a different architecture to the one planned for the Cainteoir engine. It combines several phases (syllable analysis, phoneme morphology, etc.), making it hard to isolate these and test them. The implementation is in C with a poorly maintained codebase (unused variables, variables all declared at the top, int/1/0 instead of bool/true/false, etc).

Because of this, it would be better to implement the text-to-speech processing phases in the Cainteoir engine, designed the way I want it (separate layers, individually tested, document reader event processing, independant language and voice, phoneme morphology a separate phase, etc.). Support for the espeak language (dictionary) and voice (phoneme) files can be provided in this architecture.

MBROLA voices can be properly supported as a set of voices on an external synthesizer with the correct phoneme compatibility map. The pthreads+fork interaction can also be handled correctly when stopping reading.

rhdunn commented 11 years ago

The language/dictionary data support is now being tracked in issue #34 and the voice/phoneme support in issue #36.