iki / unidecode

Unicode transliteration in Python (clone of Tomaž Šolc repository at zemanta.com)
http://www.tablix.org/~avian/blog/archives/2009/01/unicode_transliteration_in_python/
114 stars 20 forks source link

I wanna "香港" to be "Hong Kong", not the Fxxking "Xiang Gang" #7

Open dukelec opened 9 years ago

dukelec commented 9 years ago

Why chinese character always coded as Pinyin? Pinyin only used in Mandarin (PuTongHua), Chinese people in Kwangtung, HongKong and Macao using Cantonese, not Mandarin!

iki commented 9 years ago

Oops, never used Cantonese, or Mandarin. Can you please open PR with the proposed change?

miurahr commented 9 years ago

To support this behavior, it need to be aware with preferred language. Unidecode has a single mapping table and it is different with the demand.

I've developed an enhanced version of unidecode, named unihandecode that has a basic capability for it. https://github.com/miurahr/unihandecode

You ca see an experimental support for Cantonese in master branch(look at test code). It uses the Unicode standard Unihan reading definitions for dictionary. This may help you.