himselfv / wakan

Japanese and Chinese learning tool with dictionary
36 stars 7 forks source link

Rare and weird syllables in Pinyin tables #293

Closed himselfv closed 5 years ago

himselfv commented 5 years ago

Original report by me.

Multiple problematic entries in the existing RPY tables have been discovered:

PinYin

r,r
ㄏng,hng
ng,ng
o,o
ㄧo,yo

Wade-Giles

r,Erh
ㄏng,Hng
ng,Ng
o,O
ㄧo,Yo

Yale

r,Er
ㄏng,Hng
ng,Ng
o,O
ㄧo,Yo

Some of these had originally been in the form: 0072,r. Since different RPYs have had the same line in different forms (some in hex, others in latin), this is simply a remnant of hex->unicode conversion. I've converted all of these to latin.

What are these latin->latin entries? Rare and/or exceptional Pinyin characters:

Sources quote this list of exceptions: er, -r; n; ng, hng, m, hm.

There are several groups of characters which I had to deal with separately.

o/Yo syllables

These are normal but rare. CC-EDICT has no words with o- syllable, but a few with yo-, all interjections:

  哎唷 哎唷 [ai1 yo1]
  哎喲 哎哟 [ai1 yo1]
  啊喲 啊哟 [a1 yo5]

Wiktionary confirms these readings and gives these bopomofo translations for the syllables:

  yo1 = ㄧㄛ
  o1 = ㄛ

This doesn't contradict anything in RPYs, agrees with other sources, these bopomofo syllables are unused, they seem to be right. I've simply added them to all RPYs.

n, ng/hng, m/hm

These are valid exceptional syllables. No words with n1/2/3/4/5, ng1/2/3/4/5 in CC-EDICT.

ㄥ eng and ㄤ ang which in composition sometimes sound like ng, but they are not how you write these.

We already have n/en: ㄣ. Logically,

er, -r

This is the worst of them all. Chinese has this feature called Erhua (https://en.wikipedia.org/wiki/Erhua) where some words might end in additional R. This is represented by adding ㄦ (er syllable) with the tone mark 5 or without a tone mark (== tone 1 in bopomofo).

Note that the full syllable 儿 never by itself is pronounced in either the first tone or neutral tone so in either case there is no possible ambiguity with the full syllable ㄦ.

https://chinese.stackexchange.com/a/25484

In short, 儿, 儿1, 儿5 == final R; 儿2,儿3,儿4 == ER syllable.

Pinyin represents this by simply adding "r", either without a tone (== tone 5 in Pinyin), or with tone 1, or with explicit tone 5. Wade-Giles does the same.

CC-EDICT in particular has enough entries with "r5" syllable, usually as a reading for 兒 and/or 儿 kanji.

Implemented this by adding ugly conditionals to the RPY. Something like this (paraphrased):

儿,儿1,儿5 -> -r
儿2,儿3,儿4 -> er
r* -> 儿*
er* -> 儿*

In other words:

Non-standard Bopomofo letters

Wikipedia says:

Three letters formerly used in non-standard dialects of Mandarin are now also used to write other Chinese varieties. Some Zhuyin fonts do not contain these letters. ㄪ v v v ㄫ ŋ ng ng ㄬ ɲ gn ny

  1. I'm not sure if these "non-standard dialects NG/HNG" and the above ㄣㄍ/ㄏㄣㄍ that many sources list as "exception syllables in Pinyin" are the same thing, ideologically. Maybe someone with better knowledge will tell. For now I'm treating them as different cases.

  2. It seems like these NG/HNG have previously sometimes been written with 儿 [ŋ] ng too! Probably with these dialects 儿 represented "ng" and not "-r".

These characters are now official Unicode, but are still very uncommon. Most fonts don't have them. I'm not going to bother with them for now unless someone asks for them.

If someone asks for them, there's an option of adding them as a separate RPY so that 1. they work over all RPYs and 2. you can prioritize what you want your ng to mean (ㄏㄣㄍ/ㄫ).