quadrismegistus / prosodic

Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
http://quadrismegistus.github.io/prosodic/
GNU General Public License v3.0
277 stars 43 forks source link

Phoneme /g/ is represented by different characters in words from `./dicts/en/english.tsv` and words transcribed using TTS #45

Closed evgenykochetkov closed 1 year ago

evgenykochetkov commented 1 year ago
>>> import prosodic as p
>>> text = p.Text("google good")
000001  google                  P:'ɡʉː.ɡʌl                              S:PU    W:HH
000002  good                    P:'gʊd                                  S:P     W:H
>>> text.ents(cls='Word')[0].children[0]
<Syllable.goo> ['ɡʉː]
>>> text.ents(cls='Word')[0].children[0].children[0].onset
<Onset> [ɡ]
>>> text.ents(cls='Word')[0].children[0].children[0].onset.children[0]
ɡ
>>> text.ents(cls='Word')[0].children[0].children[0].onset.children[0].feats
{}
>>> text.ents(cls='Word')[1].children[0].children[0].onset.children[0]
g
>>> text.ents(cls='Word')[1].children[0].children[0].onset.children[0].feats
{'approx': False, 'cons': True, 'son': False, 'syll': False, 'constr': False, 'spread': False, 'voice': True, 'long': None, 'cont_acoust': False, 'cont_artic': False, 'delrel': False, 'lat': False, 'nas': False, 'strid': False, 'tap': False, 'trill': False, 'coronal': False, 'dorsal': True, 'labial': False, 'labiodental': False, 'ant': False, 'dist': False, 'back': True, 'front': None, 'high': True, 'low': False, 'tense': None, 'round': False}

g in "good" is represented by a regular 'g' character (U+0067) and correctly loads features from ./lib/ipa.py

ɡs in "google" are represented by latin small letter script g (U+0261), and as a result has no feats