osori / korean-romanizer

A Python library for Korean romanization
https://korean-romanizer.ij.fyi
Other
97 stars 7 forks source link

KeyError: 12640 #11

Open prestto opened 3 years ago

prestto commented 3 years ago

Hi, apologise, I can't give much insight as to why this throws an error, but:

from korean_romanizer.romanizer import Romanizer
Romanizer('경인로 34번길 79-2  ㅠ동 201호(숭의').romanize()

yields the error:

KeyError                                  Traceback (most recent call last)
<ipython-input-4-429dd49ff95b> in <module>
     12 
     13 # romanize_kr()
---> 14 Romanizer('경인로 34번길 79-2  ㅠ동 201호(숭의').romanize()

~/.python_virtualenvs/scripts-Ot3yg93O/lib/python3.7/site-packages/korean_romanizer/romanizer.py in romanize(self)
    110                 s = Syllable(char)
    111                 #try:
--> 112                 _romanized += onset[s.initial] + vowel[s.medial] + coda[s.final]
    113                 #except Exception as e:
    114                 #    _romanized += "[에러:" + str(e) + "]"

KeyError: 12640

Sorry I can't be of more help fixing this (I don't speak Korean). But good luck with the project, super useful.

osori commented 3 years ago

First of all, thank you so much for your issue! I apologize you had to face this error.

So this error was coming from the typo since it is not a complete syllable (cf. 유 would be a complete syllable). I guess the user meant to write B because it maps to the same key of on the Korean keyboard (see the picture below). In other words, the user probably meant to write B동, not ㅠ동.

image

My proposed fix is not romanizing this and just returning the original character, so I uploaded a new version, 0.22, on PyPI and you can try it by upgrading the library with pip. However, if you still would like to force romanize this, you could uncomment the line 113-119, and comment the line 120 in romanizer.py.

Again, thank you so much, and let me know if you need any other help!