rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

Create a phoneme model for internally representing phoneme data #38

Closed rhdunn closed 11 years ago

rhdunn commented 11 years ago

The description of phonemes should be modelled on phoneme features (e.g. those described in Kirshenbaum's ASCII-IPA document) instead of via specific transcription schemes.

This allows the language dictionary, letter-to-phoneme rules, phoneme-to-phoneme rules and supported phonemes by a voice to all use different transcription schemes yet work together.

It also allows the creation of phoneme transcription converters (e.g. ASCII-IPA to Unicode IPA) -- create one under src/apps.

The data/phonemes/mkchart.py program converts a phoneme transcription set description file into a chart similar to the IPA chart. This should really be moved out into a proper GUI editor/viewer (part of a cainteoir-editor program?).

The data/phonemes/phoneme-features.csv file provides a list of phoneme features that was modelled on the Kirshenbaum feature list. It has been expanded to support more of the IPA chart, but is currently lacking in diacritic support.

There are also *.phon files under data/phonemes that support different transcription schemes to limited degrees.

Support for this should be moved into libcainteoir with tests for the different transcription schemes (testing the transcription to feature mapping). For example, the test file:

di:p

would have the feature list:

{vcd,alv,stp}
{lng,hgh,fnt,unr,vwl}
{vls,blb,stp}

and vice-versa for the ascii-ipa transcription.

There should be a transcription_scheme class that loads a phoneme transcription scheme file and handles the mapping of the transcription to/from the underlying model. This can then be used by the dictionary and letter-to-phoneme file processors to convert to the underlying phoneme model.

The voice synthesizers will accept the phonemes using the phoneme model and map this to the voice's phonemes using their own transcription_scheme data.

The transcription scheme converter will use two instances of the transcription_scheme class to map between transcriptions. This includes being able to load voice-specific transcription schemes.

The "output phonemes as text" mode of the text-to-speech engine is just a matter of feeding the phonemes into the appropriate transcription scheme instead of sending them to a voice synthesizer. In fact, there could be a voice synthesizer implementation that fees the phonemes to the transcription scheme and then either outputs to the console (stdout) or a file -- for example transcription_synthesizer(FILE *output).

rhdunn commented 11 years ago

This has been implemented.