suminb / hanja

한글, 한자 라이브러리
134 stars 16 forks source link

Being aware of some hanjas' phonetic changes #7

Open dahlia opened 8 years ago

dahlia commented 8 years ago

Some hanjas like 金/讀/畵 can be pronounced in different ways. The current behavior can produce incorrect results in some cases e.g.:

See also the following table:

Hanja Word 1 Word 2
剛經 (강경) 浦國際空港 (포국제공항)
書 (서) 點 (구점)
龍點睛 (룡점정) (기)
suminb commented 8 years ago

Thanks for your report. I'm unable to investigate this issue at the moment, but I'll try to re-visit this sometime this week.

suminb commented 7 years ago

Sorry for the late response. It's almost been a year 😆

I looked into this briefly, and it looks like there is no easy way to deal with this issue other than making a huge rule table. Or maybe I'm missing something... If anyone could suggest a solution for this, it would be much appreciated.

chaaklau commented 4 years ago

A huge rule table would be the easiest solution. Since you are using a mapping file, if you have a list of phrases that do not use the most common reading, you can place those phrases on top of your file.

For example, the hanja 金 has two readings 금 and 김. 김 is everywhere and you can't possibly list out all names with the reading 김. If you have a list of all '金' words, e.g. 대금 (代金), 금고 (金庫), 금요일 (金曜日) etc., put them on the top of your list. If there is no matched phrases, then use the default pronunciation. The table will grow 10 times bigger but it should not affect run-time too much.