rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

allow control over the way acronyms and special data is processed #29

Closed rhdunn closed 11 years ago

rhdunn commented 11 years ago

Most text-to-speech engines are inflexible in the way that content handling (ordinals, cardinals, percentages, dates, etc.) are handled, along with acronyms.

For a given language, the following things need to be provided:

  1. character classification rules -- used for identifying words, ordinals, cardinals, punctuation, abbreviations, etc.
  2. part of speech tagging rules -- used to identify different word forms (e.g. "read" - /r'i:d/ vs /r'Ed/), associating a variant to them (e.g. "read" -> read/1 (verb) vs read/2 (verb, past))
  3. classified type to word rules -- used to normalize the text stream to a word list (e.g. "St. Noun" -> "saint noun" , "Noun St." -> "noun street" and "St. Noun St." -> "saint noun street"; same for "Dr." -> doctor/drive)
  4. pronunciation dictionary -- used to map word/variant to a pronunciation transcription (e.g. read/1 -> /r'i:d/)
  5. acronym dictionary -- used to map acronyms and abbreviations to words
  6. letter to phoneme rules -- used to handle words not in the pronunciation dictionary [*]
  7. phoneme to phoneme rules -- used to handle prosodic morphology (e.g. vowel weakening on unstressed vowels)

[*] Strictly speaking, an exception dictionary should be created with any word from the pronunciation dictionary that cannot be constructed using the letter to phoneme and phoneme to phoneme rules. This allows the exception dictionary to be small and the letter to phoneme rules to be tested and verified against a reference set of words.

It should be possible to choose the classification scheme and abbreviation rules for the document being read. For example, using email/SMS abbreviations in email documents.

Where possible, the text-to-speech engine should select appropriate defaults, but this behaviour should be overridable (e.g. supressing US state abbreviation expansion on addresses).

For the UI, this could be handled as a drop-down with a list of profiles ("email/sms", "novel", "technical", "chess", etc.)

rhdunn commented 11 years ago

Deferring this as it is too broad and needs implementation experience to better define the scope. That is, it is better to have smart and accurate content detection. This is especially true when the content is mixed (e.g. a chess match performed over email).