timmahrt / pysle

Python interface to ISLEX, an English IPA pronunciation dictionary with syllable and stress marking.
Other
47 stars 5 forks source link

multi word entries give single word pronunciation #3

Closed wassname closed 8 years ago

wassname commented 8 years ago

I'm not sure if this is intended but multi word entries only return the first words pronunciation. E.g. for "australian_seal" we only get the pronunciation for australian.

from pysle import isletool
isleDict = isletool.LexicalTool('../data/isledict/ISLEdict.txt')

isleDict.data["australian_seal"]
#> ['# ɑ . s t ɹ ˈei l . j n̩ # s i l #']

isleDict.lookup('australian_seal'))
#> [([['ɑ'], ['s', 't', 'ɹ', 'ˈei', 'l'], ['j', 'n̩']], [1], [3])]

Thanks for making this package!

timmahrt commented 8 years ago

This turned out to be a tricky issue. The code was cutting off everything after the second '#'--the end of the first word. These multi-word entries are relevant because they may showcase stress shift phenomena or perhaps alternative pronunciations that don't surface when the word appears alone.

Anyhow, lookup() now returns a list of words. So from the above example: isleDict.lookup('australian_seal'))

> [(([[u'\u0251'], [u's', u't', u'\u0279', u'\u02c8ei', u'l'], [u'j', u'n\u0329']], [1], [3]),),

(([[u's', u'i', u'l']], [], []),)]

If there is still only one word match, it will return a list with one element in it like so: isleDict.lookup('australian_seal'))

> [(([[u'\u0251'], [u's', u't', u'\u0279', u'\u02c8ei', u'l'], [u'j', u'n\u0329']], [1], [3]),), ]

So, the code is a little more complex but it has the correct behavior now link to bugfix

I added one function to take advantage of this new functionality. It should probably be expanded but I'm not exactly sure the best way to go about doing it at the moment link to code