rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

Support contexts for dictionary entries #55

Open rhdunn opened 10 years ago

rhdunn commented 10 years ago

The context of a dictionary entry can be used to disambiguate words with the same spelling, but different pronunciations:

context description/usage
after-noun the word occurs after a noun
before-noun the word occurs before a noun
date writing a date (e.g. 31st Jan)
femanine the person/thing is known to be female
masculine the person/thing is known to be male (e.g. Dutch male name)
noun the word is a noun
number the word is a number
spelling the word is a letter and is used when spelling out words
stressed the word is emphasised
unstressed the word is not emphasised
verb the word is a verb
verb-past the word is the past form of a verb

For example:

jan /dZ'an/ # [femanine], i.e. female name
jan /j'an/ [masculine] # e.g. Dutch male name
jan january [date]

i /'aI/
i 1 [number] # roman numeral

a /'eI/ [spelling] [stressed]
a /@/ [unstressed]

st street [after-noun] # e.g. Bridge St.
st saint [before-noun] # e.g. St. Helen

lead /l'i:d/ [verb]
lead /l'Ed/ [noun] [verb-past]

This can be used for part of speech tagging of other words that don't have ambiguous pronuncitions to help disambiguation.

In order to avoid duplicating context entries for different pronunciations and to keep the dictionary format stable, a special context dictionary will have the format:

word word@variant [context] ... [context] # optional comment

Here, variant is a natural number that refers to the given pronunciation context/form. The dictionary will have the following entries:

word word@default
word@1 ...
...
word@n ...

where default refers to the default pronunciation for the word (i.e. when no identifying context can be deduced). This is usually the most common form, or the most easily identified.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/1053258-support-contexts-for-dictionary-entries?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254961&utm_medium=issues&utm_source=github).