Wrong Pinyin/ No Zhuyin for 奧地利

mthewissen commented 9 years ago

# coding: UTF-8

from dragonmapper import hanzi, transcriptions, __version__
print __version__

au = u'奧'
print hanzi.to_pinyin(au)
print hanzi.to_pinyin(au, accented=False)
print hanzi.to_zhuyin(au)

austria = u'奧地利'
print hanzi.to_pinyin(austria, accented=True)
print hanzi.to_pinyin(austria, accented=False)
print hanzi.to_zhuyin(austria)

Outputs:

0.2.3
ào
ao4
ㄠˋ
Àodìlì
Ào5di4li4
Traceback (most recent call last):
  File "<filepath>", line 13, in <module>
    print hanzi.to_zhuyin(austria)
  File "build/bdist.linux-x86_64/egg/dragonmapper/hanzi.py", line 190, in to_zhuyin
  File "build/bdist.linux-x86_64/egg/dragonmapper/transcriptions.py", line 365, in pinyin_to_zhuyin
  File "build/bdist.linux-x86_64/egg/dragonmapper/transcriptions.py", line 341, in _convert
  File "build/bdist.linux-x86_64/egg/dragonmapper/transcriptions.py", line 229, in pinyin_syllable_to_zhuyin
ValueError: Not a valid syllable: o5

(using the developer branch)

tsroten commented 9 years ago

@mthewissen Thanks for this bug report. I appreciate you pointing out this issue.

In the 3rd party package zhon, pinyin syllables are defined in regular expression patterns as lowercase letters. So, Dragon Mapper uses the re.IGNORECASE constant to match the pinyin syllables that might include uppercase letters. In Python 3, the re library handles Unicode well, and this works fine. But, in Python 2, we need to add the re.UNICODE constant as well so that it will correctly catch the uppercase pinyin letters with diacritics.

I'll commit a fix that uses the re.UNICODE constant for transcription conversion shortly.

tsroten commented 9 years ago

I just released version 0.2.4 to PyPI, which includes the fix for this issue. Thanks again for your help.

mthewissen commented 9 years ago

It works fine now. Thanks for the speedy update!

tsroten / dragonmapper

Wrong Pinyin/ No Zhuyin for 奧地利 #8