polm / cutlet

Japanese to romaji converter in Python
https://polm.github.io/cutlet/
MIT License
305 stars 21 forks source link

KeyError: 'ー' #9

Closed ykim closed 4 years ago

ykim commented 4 years ago

I ran into another issue parsing a title of a book, ティンクル☆くるせいだーすGoGo!(1). The error is below:

% cutlet
ティンクル☆くるせいだーすGoGo!(1)
Traceback (most recent call last):
  File "/Users/ykim/.local/share/virtualenvs/sandbox-nIHPi2Hu/bin/cutlet", line 14, in <module>
    print(katsu.romaji(line.strip()))
  File "/Users/ykim/.local/share/virtualenvs/sandbox-nIHPi2Hu/lib/python3.8/site-packages/cutlet/cutlet.py", line 127, in romaji
    roma = self.romaji_word(word)
  File "/Users/ykim/.local/share/virtualenvs/sandbox-nIHPi2Hu/lib/python3.8/site-packages/cutlet/cutlet.py", line 191, in romaji_word
    return self.map_kana(kana)
  File "/Users/ykim/.local/share/virtualenvs/sandbox-nIHPi2Hu/lib/python3.8/site-packages/cutlet/cutlet.py", line 231, in map_kana
    out += self.get_single_mapping(pk, char, nk)
  File "/Users/ykim/.local/share/virtualenvs/sandbox-nIHPi2Hu/lib/python3.8/site-packages/cutlet/cutlet.py", line 264, in get_single_mapping
    return self.table[kk]
KeyError: 'ー'

This may be related to the changes from #7 and/or #8. This string did not error before either changes. My guess is that the in the middle is causing some issues.

polm commented 4 years ago

Thanks for the bug report.

The issue here is not the star, it's the thing that looks like a hyphen. That's actually a half-width long vowel stroke (長音符). I didn't have any handling for half-width katakana, so they were failing at lookup time.

This worked in older versions of the code because unknown characters were passed through. That would have been bad with text like this, as it would look like normal ascii, but have to be encoded in URLs or other situations. Thanks for helping me catch it!

I just released 0.1.9, which should fix this issue.