rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

Support reading dates and times #44

Open rhdunn opened 11 years ago

rhdunn commented 11 years ago

This is detecting years, months, days, hours minutes and seconds, associating the correct pronunciation to them.

This won't detect all uses of these, but the more that can be detected the better the reading experience will be.

Should also look at how different languages represent dates (both written and pronounced), e.g. Japanese/Chinese uses <number>{sun}<number>{moon}<number>{year} where {...} represents the associated Han character.

Years

Years less than 2000 should be pronounced in digit pairs, e.g. 1876 should be pronounced as 18 76. This can be done in tts/word_stream by setting the number scale to 100 (i.e. 2 digits) and not pronouncing an "and" between the groups.

Years in isolation can be detected in some contexts. For example, 1960s represents a year (technically, it is a range of years).

Months

Months can be abbreviated (e.g. Apr for April). These should be defined in a months.dict file per locale. It should also have the long-form names so this set can be used to identify dates correctly.

NOTE: The months.dict data should only be used to detect date formats and not arbitrarily expand the month abbreviations (e.g. Jan can also be a person's name).

Days

Days are written as cardinal numbers, but spoken as ordinal numbers.

Hours, Minutes, Seconds

  1. <hours>:<minutes>:<seconds> is pronounced as <hours> <minutes> and <seconds> seconds.
  2. <hours>:00:<seconds> is pronounced as <hours> hours and <seconds> seconds.
  3. <hours>:<minutes>pm is pronounced as <hours> <minutes> p m.
  4. <hours>:00pm is pronounced as <hours> p m.

There may also be an AM/PM after it as well as a timezone (BST, PST, etc). All of these are pronounced as abbreviations.

Date/Time Formats

2013-01-28
1970s

The formats:

Wed Apr 10, 2013 7:26 pm
Friday, December 21, 2018 4:05 PM EST
Apr 12, 1:50am BST
SEPTEMBER 16
Wednesday, November 19, 2003
November 19, 2003
Mid-June to mid-September, 2004
September 2002
Monday, Nov. 18

all share a common format:

MONTH_NAME := (SHORT_MONTH_NAME '.'?) | LONG_MONTH_NAME
DATE := (DAY_OF_WEEK ','?)? MONTH_NAME (DAY (',' YEAR)? | YEAR)?
TIME := HOURS ':' MINUTES ('am' | 'pm') ('est' | 'bst' | ...)
DATETIME := DATE TIME?

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1026779-support-reading-dates-and-times?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github).