Multiple problematic entries in the existing RPY tables have been discovered:
PinYin
r,r
ㄏng,hng
ng,ng
o,o
ㄧo,yo
Wade-Giles
r,Erh
ㄏng,Hng
ng,Ng
o,O
ㄧo,Yo
Yale
r,Er
ㄏng,Hng
ng,Ng
o,O
ㄧo,Yo
Some of these had originally been in the form: 0072,r. Since different RPYs have had the same line in different forms (some in hex, others in latin), this is simply a remnant of hex->unicode conversion. I've converted all of these to latin.
What are these latin->latin entries? Rare and/or exceptional Pinyin characters:
Sources quote this list of exceptions: er, -r; n; ng, hng, m, hm.
There are several groups of characters which I had to deal with separately.
o/Yo syllables
These are normal but rare. CC-EDICT has no words with o- syllable, but a few with yo-, all interjections:
哎唷 哎唷 [ai1 yo1]
哎喲 哎哟 [ai1 yo1]
啊喲 啊哟 [a1 yo5]
Wiktionary confirms these readings and gives these bopomofo translations for the syllables:
yo1 = ㄧㄛ
o1 = ㄛ
This doesn't contradict anything in RPYs, agrees with other sources, these bopomofo syllables are unused, they seem to be right. I've simply added them to all RPYs.
n, ng/hng, m/hm
These are valid exceptional syllables. No words with n1/2/3/4/5, ng1/2/3/4/5 in CC-EDICT.
ㄥ eng and ㄤ ang which in composition sometimes sound like ng, but they are not how you write these.
We already have n/en: ㄣ. Logically,
n/ㄣ + g/ㄍ = ng/ㄣㄍ. This seems to be right (ex: Makutung / 麻枯ㄊㄨㄣㄍ)
h is ㄏ and we already had it in the entry "ㄏng,hng", so h+ng = ㄏㄣㄍ
It could've been ㄏㄥ, but it's already "heng".
m is ㄇ and there's no conflicting interpretations of it.
This is the worst of them all. Chinese has this feature called Erhua (https://en.wikipedia.org/wiki/Erhua) where some words might end in additional R. This is represented by adding ㄦ (er syllable) with the tone mark 5 or without a tone mark (== tone 1 in bopomofo).
Note that the full syllable 儿 never by itself is pronounced in either the first tone or neutral tone so in either case there is no possible ambiguity with the full syllable ㄦ.
In short, 儿, 儿1, 儿5 == final R; 儿2,儿3,儿4 == ER syllable.
Pinyin represents this by simply adding "r", either without a tone (== tone 5 in Pinyin), or with tone 1, or with explicit tone 5. Wade-Giles does the same.
CC-EDICT in particular has enough entries with "r5" syllable, usually as a reading for 兒 and/or 儿 kanji.
Implemented this by adding ugly conditionals to the RPY. Something like this (paraphrased):
儿,儿1,儿5 -> -r
儿2,儿3,儿4 -> er
r* -> 儿*
er* -> 儿*
In other words:
儿 is converted to "-r" or "er" depending on the tone that follows. 儿 without a tone is 儿1 by definition.
"r" and "er" are simply converted to 儿. The tone that follows, whatever it is, will simply be converted to that tone in bopomofo.
Three letters formerly used in non-standard dialects of Mandarin are now also used to write other Chinese varieties. Some Zhuyin fonts do not contain these letters.
ㄪ v v v
ㄫ ŋ ng ng
ㄬ ɲ gn ny
I'm not sure if these "non-standard dialects NG/HNG" and the above ㄣㄍ/ㄏㄣㄍ that many sources list as "exception syllables in Pinyin" are the same thing, ideologically. Maybe someone with better knowledge will tell. For now I'm treating them as different cases.
It seems like these NG/HNG have previously sometimes been written with 儿 [ŋ] ng too! Probably with these dialects 儿 represented "ng" and not "-r".
These characters are now official Unicode, but are still very uncommon. Most fonts don't have them.
I'm not going to bother with them for now unless someone asks for them.
If someone asks for them, there's an option of adding them as a separate RPY so that 1. they work over all RPYs and 2. you can prioritize what you want your ng to mean (ㄏㄣㄍ/ㄫ).
Original report by me.
Multiple problematic entries in the existing RPY tables have been discovered:
PinYin
Wade-Giles
Yale
Some of these had originally been in the form:
0072,r
. Since different RPYs have had the same line in different forms (some in hex, others in latin), this is simply a remnant of hex->unicode conversion. I've converted all of these to latin.What are these latin->latin entries? Rare and/or exceptional Pinyin characters:
Sources quote this list of exceptions: er, -r; n; ng, hng, m, hm.
There are several groups of characters which I had to deal with separately.
o/Yo syllables
These are normal but rare. CC-EDICT has no words with o- syllable, but a few with yo-, all interjections:
Wiktionary confirms these readings and gives these bopomofo translations for the syllables:
This doesn't contradict anything in RPYs, agrees with other sources, these bopomofo syllables are unused, they seem to be right. I've simply added them to all RPYs.
n, ng/hng, m/hm
These are valid exceptional syllables. No words with n1/2/3/4/5, ng1/2/3/4/5 in CC-EDICT.
ㄥ eng and ㄤ ang which in composition sometimes sound like ng, but they are not how you write these.
We already have n/en: ㄣ. Logically,
er, -r
This is the worst of them all. Chinese has this feature called Erhua (https://en.wikipedia.org/wiki/Erhua) where some words might end in additional R. This is represented by adding ㄦ (er syllable) with the tone mark 5 or without a tone mark (== tone 1 in bopomofo).
In short, 儿, 儿1, 儿5 == final R; 儿2,儿3,儿4 == ER syllable.
Pinyin represents this by simply adding "r", either without a tone (== tone 5 in Pinyin), or with tone 1, or with explicit tone 5. Wade-Giles does the same.
CC-EDICT in particular has enough entries with "r5" syllable, usually as a reading for 兒 and/or 儿 kanji.
Implemented this by adding ugly conditionals to the RPY. Something like this (paraphrased):
In other words:
儿 is converted to "-r" or "er" depending on the tone that follows. 儿 without a tone is 儿1 by definition.
"r" and "er" are simply converted to 儿. The tone that follows, whatever it is, will simply be converted to that tone in bopomofo.
Non-standard Bopomofo letters
Wikipedia says:
I'm not sure if these "non-standard dialects NG/HNG" and the above ㄣㄍ/ㄏㄣㄍ that many sources list as "exception syllables in Pinyin" are the same thing, ideologically. Maybe someone with better knowledge will tell. For now I'm treating them as different cases.
It seems like these NG/HNG have previously sometimes been written with 儿 [ŋ] ng too! Probably with these dialects 儿 represented "ng" and not "-r".
These characters are now official Unicode, but are still very uncommon. Most fonts don't have them. I'm not going to bother with them for now unless someone asks for them.
If someone asks for them, there's an option of adding them as a separate RPY so that 1. they work over all RPYs and 2. you can prioritize what you want your ng to mean (ㄏㄣㄍ/ㄫ).