rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

Use counter-styles to parse and speak number systems. #30

Closed rhdunn closed 10 years ago

rhdunn commented 11 years ago

The CSS Counter Style Level 3 specification supports defining number formats, including upper/lower roman numerals and Armenian.

This is useful and can be extended to support other use cases applicable to Cainteoir Text-to-Speech. Specifically:

  1. Parse number systems into a common representation (e.g. western decimal digit form).

    This is useful when pronouncing numbers from different number systems in languages other than those the native language supports. It also means that the "pronounce number" algorithms can all be defined in terms of the canonical number representation.

  2. Specify how to pronounce numbers.

    This will be a mapping of the language (locale tag) to a counter style. The counter style could be one of the W3C defined styles or could be a custom style.

NOTE: The specification of the spoken forms should be done in the same way that the textual forms are encoded (i.e. via counter styles). They should be named "-ctts-spoken-", e.g. -ctts-spoken-french, should be cabable of encoding the algorithm defined in espeak (src/numbers.cpp) and other text-to-speech programs (with possible -ctts-* stylesheet extensions) and be stored in the css/numbers-.css files.

NOTE: Some of these spoken forms may have the same structure as their written forms. These should have the -ctts-spoken-* style be an alias for the written counter style.

NOTE: There need to be "-ordinal" and "-cardinal" variants for spoken numbers, as well as support for decimals and fractions. There may also need to be support for alternative pronunciations for things like years, times, etc. These can be specified using a "-ctts-number-variant: [ordinal | cardinal | year | time | ...] ;" style.

NOTE: These spoken number forms need to be able to speek very-large numbers.

See Also: http://reecedunn.co.uk/cainteoir/design/numbers.html

rhdunn commented 10 years ago

The counter style spec is not currently flexible enough to handle spoken numbers. The current dictionary-based model with special entries works and should be extended to cover languages it does not currently support fully.