rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

Make the text-to-speech chain flexible #46

Open rhdunn opened 11 years ago

rhdunn commented 11 years ago

It should be possible to specify the specific call chain of the different text analysis parts:

  1. text reader -- splits the document events into words, numbers and punctuation;
  2. context analysis -- identifies the type of punctuation (comma, etc.) and number (ordinal, year, etc.)
  3. word stream -- converts the numbers and audible punctuation to words
  4. part of speech tagger -- tags words with their associated part of speech
  5. part of speech disambiguator -- resolves ambiguous part of speech tag assignments

It should be possible to build a pipeline of these and others in an arbitary order. This means:

  1. Creating an abstract class:

    struct text_event_reader
    {
       virtual const text_event &event() const = 0;
       virtual bool read() = 0;
    };
  2. Having all the analysis parts above implement the abstract class.
  3. Making the classes take a std::shared_ptr<text_event_reader> instead of a std::shared_ptr<document_reader> (except for text_reader which starts the process).
  4. Optionally hiding them behind a createXYZ method.

This will be a performance hit (need to measure to see how much), but it adds flexibility -- especially for using different grapheme to phoneme rules.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1026777-make-the-text-to-speech-chain-flexible?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github).