roedoejet / g2p

Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!
https://g2p-studio.herokuapp.com
Other
119 stars 26 forks source link

Inconsistent behavior in moh_equiv #383

Open MENGZHEGENG opened 2 weeks ago

MENGZHEGENG commented 2 weeks ago

If à is followed by some letter (except :), we have

>>> text="hnàh"
>>> transducer(text).output_string
'hnɑ̀ːh'

However, if à is followed nothing, we have

>>> text="hnà"
>>> transducer(text).output_string
'hnà'

By https://github.com/roedoejet/g2p/blob/main/g2p/mappings/langs/moh/moh_equiv.json, the second scenario satisfies the condition in {"in": "à", "out": "à:", "context_after": "[^:]"}, based on my understanding it should output

'hnà'ː

MENGZHEGENG commented 2 weeks ago

The similar phenomenon also occurs in the case èn, where èn can be the last character of one syllable

The same phenomenon also occurs in the case è or ì, where è or ì is the last character of one syllable