jacksonllee / pycantonese

Cantonese Linguistics and NLP
https://pycantonese.org
MIT License
354 stars 38 forks source link

Parse chinese character to jyutping #18

Closed mirfan899 closed 3 years ago

mirfan899 commented 5 years ago

I've gone through the documentation but did not find how to convert Chinese sentence to jyutping. For example something like this.

import pycantonese as pc

pc.parse_to_jyutping("我係香港人")
'ngo5 hai6 hoeng1 gong2 jan4'
jacksonllee commented 5 years ago

Hello, the characters-to-jyutping functionality isn't available yet, but is definitely a very reasonable feature for pycantonese. Hopefully we can get to implementing it sooner rather than later.

alopezz commented 3 years ago

Hi!

@jacksonllee Do you accept contributions? I may be willing to give this a shot.

laubonghaudoi commented 3 years ago

@mirfan899 We have a Jyutping conversion tool here, please try it out: https://github.com/CanCLID/ToJyutping

mirfan899 commented 3 years ago

Great!!! thanks.

jacksonllee commented 3 years ago

Hello everyone, I finally got around to finish some related work and just made a new release of pycantonese v2.4.1 to PyPI (pip install -U pycantonese should install the latest release). The new characters2jyutping() function does what this ticket asks for:

>>> import pycantonese as pc
>>> pc.characters2jyutping('我係香港人')
['ngo5', 'hai6', 'hoeng1', 'gong2', 'jan4']

Docs: http://pycantonese.org/jyutping.html#characters-to-jyutping-conversion

(Relatedly, this new pycantonese release also includes word segmentation, which the characters-to-jyutping conversion functionality depends on: http://pycantonese.org/word_segmentation.html)

I'm closing this ticket as resolved. Please feel free to let me know if there are other questions. Thanks!