persephone-tools / persephone

A tool for automatic phoneme transcription
Apache License 2.0
155 stars 26 forks source link

4 minor changes to list of phonemes in na.py #204

Open alexis-michaud opened 5 years ago

alexis-michaud commented 5 years ago

(This issue is specific to Yongning Na: preprocessing the XML files)

Taking (belatedly) a look at persephone/persephone/datasets/na.py I wonder why nasal vowels, 'ĩ', 'õ', 'ẽ'appear among the set of unitary phonemes ('mono-graphs'), UNI_PHNS: it looks like a (small) mistake. They also appear among the set of bi-graphic phonemes, composed of 2 symbols ('di-graphs'), BI_PHNS, where they belong. So I guess they should be removed from UNI_PHNS, and that should be that.

Also, something that's for me to correct: 'ɻ̃' (in BI_PHNS) and "ɻ̩̃" (in TRI_PHNS) need to be merged to 'ɻ̍̃'. Explanation: cases of 'ɻ̃', without the diacritic indicating syllabic status, are mistakes: cases where I've been lazy and forgot to add the diacritic. When finalizing the book, I chose to put the diacritic as superscript, not subscript, for clarity. It's a convention of the International Phonetic Alphabet that diacritics that should be below can be put on top when the main character has a descender.

Likewise, in BI_PHNS, 'ɻ̩', "ɻ̍" needs to be merged to just "ɻ̍".

These conventions have now been worked into the current version of the online texts (on GitHub), through this commit.

Finally (for now), double coding of "ṽ̩", "ṽ̩" can now hopefully be taken care of through NFC Unicode normalization: Issue #125

oadams commented 5 years ago

Thanks for these clarifications. The good news is that because those nasal vowels were in the BI_PHNS set, that takes precedence and they were always treated as such.