twitter / twitter-cldr-rb

Ruby implementation of the ICU (International Components for Unicode) that uses the Common Locale Data Repository to format dates, plurals, and more.
Apache License 2.0
672 stars 93 forks source link

Add dictionary-based segmentation support #230

Closed camertron closed 4 years ago

camertron commented 4 years ago

Allows segmenting scripts that require a dictionary to find word breaks. Supports Chinese, Japanese, Korean, Lao, Thai, Khmer, and Burmese.

coveralls commented 4 years ago

Coverage Status

Coverage increased (+0.4%) to 95.577% when pulling 68937b9226e18c2b8e5fbf02e354940a93f1765b on dictionary_segmentation into ea16c16c3e183b471927354017db59adddc5da43 on master.