rhdunn / espeak

eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.
http://reecedunn.co.uk/espeak-for-android
GNU General Public License v3.0
386 stars 16 forks source link

Support better part of speech disambiguation #93

Open cmrdt opened 7 years ago

cmrdt commented 7 years ago

/lɪv/ (verb) /laɪv/ (adjective, adverb)

it doesn't seem to do verb form

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/40189509-support-better-part-of-speech-disambiguation?utm_campaign=plugin&utm_content=tracker%2F254964&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F254964&utm_medium=issues&utm_source=github).
rhdunn commented 7 years ago

This is down to the part of speech detection. It works in some cases but not all, e.g.

$ espeak-ng -xq "We are live."
wi:; A@ l'aIv
$ espeak-ng -xq "We are now live."
wi:; A@ n'aU l'Iv
$ espeak-ng -xq "We will live."
wi: wIl l'Iv

The espeak engine marks words where the verb form follows with a "$verbf" command in the dictionary. This is easy to process w.r.t. speed and memory usage, but has limited accuracy (especially in more complex cases).

Ideallly, eSpeak should support more advanced and accurate part of speech detection algorithms.

cmrdt commented 7 years ago

instead of manually looking up pronunciations and using [[___]] using

cmrdt commented 7 years ago

for heteronymns wiki:Heteronym or wiki:homographs maybe more attention to common words that sound very differently, less on words with only stress variations and since they might be exception-based there might be regional variations.

here's a list for more languages: https://en.wiktionary.org/wiki/Category:Heteronyms_by_language

from a rough tally of the en:wiki heteronymn pairs: forms only some word pairs detectable via grammar, are mainly verb-noun

rhdunn commented 7 years ago

I have annotated most word pronunciation variations in https://github.com/rhdunn/amepd. Most of these are part of speech based (noun, verb, past form of a verb) -- i.e. "WORD(part-of-speech)". These require a better algorithm to identify the different parts of speech (ideally backed by accurate data). This is what is meant by part of speech disambiguation.

More complex is sense disambiguation (e.g. the noun form of axes can either be aksIz, plural of axe, or aksi:z, plural of axis). I have annotated the amepd with both root and usage annotations to help with these more complex disambiguations.

There are also cases where it is difficult/impossible to determine the pronunciation. For example, given "She put the lead in the box." Is this li:d as in a dog lead, or lEd as in the metal. Both are possible without other information. That information may be in the previous sentences or paragraphs, or may be infered from context.

cmrdt commented 7 years ago

maybe also for some cases a fuzzy-logic-weighted decision   so that don't use rarer word forms  unless  it is very sure of it to avoid errors (has this been tested?) e.g. does verb vs plural-noun     wiktionary

also caps-case might ambiguously suggest different meaning and/or pronunciation

& POS also   e.g.  march, may, lent, polish

but are names a bit much?