jacksonllee / pycantonese

Cantonese Linguistics and NLP
https://pycantonese.org
MIT License
354 stars 38 forks source link

Simplified Chinese characters not supported #45

Open zhiqiuiyiye opened 6 months ago

zhiqiuiyiye commented 6 months ago

I try to use the jyutping to convert characters to jyutping, but I found some character can be convert: for example: txt='昆省急救服务中心嘅医护人员昆省警方。' the output is: [('昆', 'gwan1'), ('省', 'saang2'), ('急救', 'gap1gau3'), ('服', 'fuk6'), ('务', None), ('中心', 'zung1sam1'), ('嘅', 'ge3'), ('医', 'ai3'), ('护', None), ('人', 'jan4'), ('员', None), ('昆', 'gwan1'), ('省', 'saang2'), ('警方', 'ging2fong1'), ('。', None)] you can see that ‘务’,‘护’,‘员’ are None

jacksonllee commented 6 months ago

Hello! Currently, pycantonese (as of v3.4.0) supports only traditional characters. If your input contains simplified characters, you may consider piping it through a converter (such as OpenCC) before passing it to pycantonese.