plone / plone.i18n

Text normalization logic and language, country, cctld data.
8 stars 11 forks source link

Add zh-cn normalizer for Chinese #8

Closed yangh closed 11 years ago

yangh commented 11 years ago

It's a long run for this race: Zope ChinaPak from Panjunyong in 2004, Product.ChinaPak from Andelf in 2010, plone.i18n from Penguin in 2013

davisagli commented 11 years ago

Can you and other contributors to this code please sign the Plone contributor agreement? https://buildoutcoredev.readthedocs.org/en/latest/agreement.html

jianaijun commented 11 years ago

Hi, I disagree with this pull request that can be normalized Chinese characters is less than 50%, and the output is ugly.

yangh commented 11 years ago

But it covered the most popular part, about 6500 Chinese, for the rest, it fallback to base normalizer.

Could you please give some example which looks ugly?

yangh commented 11 years ago

Btw, do you have any advise for better approach to normalize the Chinese char in the plone?

It's better to get more disscus and agreement from locale community for this pull request before megred.

jianaijun commented 11 years ago

Refer: https://pypi.python.org/pypi/readset.i18n

It has been used in multiple production sites, and work well.

jianaijun commented 11 years ago

"But it covered the most popular part," It contains only the most popular Chinese Simplified parts, but not all, Almost all traditional Chinese (including the most popular traditional Chinese) and there is no normalized.

又叫正體中文,中國之外就叫傳統中文,係指同簡體中文相對嘅中文

2013/7/6 Yang Hong notifications@github.com

But it covered the most popular part, about 6500 Chinese, for the rest, it fallback to base normalizer.

Could you please give some example which looks ugly?

— Reply to this email directly or view it on GitHubhttps://github.com/plone/plone.i18n/pull/8#issuecomment-20547045 .

jianaijun commented 11 years ago

I'm sorry, mistake

"But it covered the most popular part," It contains only the most popular Chinese Simplified parts, but not all, Almost all traditional Chinese (including the most popular traditional Chinese) and there is no normalized.

"Could you please give some example which looks ugly?"

Example 1: 它--包括最受欢迎的“简体”中文而已,而不是最受欢迎的“中文”而。 ta-baokuozuishouhuanyingde201cjianti201dzhongweneryi-erbushizuishouhuanyingde201czhongwen201der

Example 2: 繁體中文又叫正體中文,中國之外就叫傳統中文,係指同簡體中文相對嘅中文。 Output: fan9ad4zhongwenyoujiaozheng9ad4zhongwen-zhong570bzhiwaijiujiao50b37d71zhongwen-4fc2zhitong7c219ad4zhongwenxiang5c0d5605zhongwen

Please refer: https://pypi.python.org/pypi/readset.i18n the output format.

2013/7/6 Yang Hong notifications@github.com

But it covered the most popular part, about 6500 Chinese, for the rest, it fallback to base normalizer.

Could you please give some example which looks ugly?

— Reply to this email directly or view it on GitHubhttps://github.com/plone/plone.i18n/pull/8#issuecomment-20547045 .

jianaijun commented 11 years ago

https://pypi.python.org/pypi/readset.i18n the output format.

Example 1: 它--包括最受欢迎的“简体”中文而已,而不是最受欢迎的“中文”而。 Output: ta-bao-kuo-zui-shou-huan-ying-de-jian-ti-zhong-wen-er-yi-er-bu-shi-zui-shou-huan-ying-de-zhong-wen-er

Example 2: 繁體中文又叫正體中文,中國之外就叫傳統中文,係指同簡體中文相對嘅中文。 Output: fan-ti-zhong-wen-you-jiao-zheng-ti-zhong-wen-zhong-guo-zhi-wai-jiu-jiao-chuan-tong-zhong-wen-xi-zhi-tong-jian-ti-zhong-wen-xiang-dui-kai-zhong-wen

yangh commented 11 years ago

Hi, jianaijun

Thanks for introducing the readset.i18n product, it's the base implement as far as I know. Thanks for your works.

BR. yangh

jianaijun commented 11 years ago

In order to avoid "reinventing the wheel", and I have submitted a Chinese normalizer patch. https://github.com/plone/plone.i18n/pull/9