languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.32k stars 1.39k forks source link

Separating Traditional and Simplified Chinese (zh --> zh-hant/zh-hans) #6100

Open babycaseny opened 2 years ago

babycaseny commented 2 years ago

There are some writings that are considered correct in zh-hans (zh-cn + zh-sg) but considered wrong in zh-hant (zh-tw + zh-hk). Is there anyway we can separate the two? Do we have to modify at the source code level, or we can do that outside?

danielnaber commented 2 years ago

You would have to modify Chinese.java so it has different subclasses, like English.java already does.

babycaseny commented 2 years ago

Just a comment: I ran through the codes for the Chinese parts for these days. It looks like it is calling an external module from Lucene about segmenting and tagging. In additional, it has another call to do the Traditional and Simplified Chinese conversion. The conversion mechanism is unclear, and I wish that will not be as simplified as those appeared in common websites in Mainland China that did just a simple one-to-one mapping.