Open babycaseny opened 2 years ago
You would have to modify Chinese.java
so it has different subclasses, like English.java
already does.
Just a comment: I ran through the codes for the Chinese parts for these days. It looks like it is calling an external module from Lucene about segmenting and tagging. In additional, it has another call to do the Traditional and Simplified Chinese conversion. The conversion mechanism is unclear, and I wish that will not be as simplified as those appeared in common websites in Mainland China that did just a simple one-to-one mapping.
There are some writings that are considered correct in zh-hans (zh-cn + zh-sg) but considered wrong in zh-hant (zh-tw + zh-hk). Is there anyway we can separate the two? Do we have to modify at the source code level, or we can do that outside?