jacksonllee / pycantonese

Cantonese Linguistics and NLP
https://pycantonese.org
MIT License
354 stars 38 forks source link

Jyutping-to-Yale output format #9

Closed jacksonllee closed 8 years ago

jacksonllee commented 8 years ago

Currently, Jyutping-to-Yale conversion always takes a string as input and returns a list of strings, regardless of the number of syllables:

>>> import pycantonese as pc
>>> pc.yale("hoeng1")
['hēung']
>>> pc.yale("hoeng1gong2")
['hēung', 'góng']

Perhaps it would be desirable to make the input and output data structures consistent, e.g., a string for both input and output. The following changes are planned:

  1. Set as default the string for both input and output of the yale() function. Allow an optional parameter to allow a list (of strings for individual syllables) to be the output.
  2. Potential Yale ambiguities: The new default string output has to be checked for ambiguities like Jyutping "hei3hau6" (氣候 climate) --> Yale "heihauh", technically ambiguous between "hei'hauh" and "heih'auh". Probably the apostrophe as the syllable separator is going to be consistently used when a potential ambiguity is detected.

(h.t. Stephan Stiller)