Open tomatolog opened 1 month ago
the related issue https://github.com/manticoresoftware/manticoresearch/issues/2507 there exceptions
can not work with the morphology='icu_chinese'
or maybe upcoming Jieba
integration from the https://github.com/manticoresoftware/manticoresearch/issues/931 could handle such cases
Proposal:
it could be better to add support of the custom rules into ICU integration
it could be better to add support of these or some of these options for
morphology='icu_chinese'
and prohibit all use of the exceptions \ wordforms \ stopwords formorphology='icu_chinese'
orngram_chars
.As cjk tokenization is related on content and exceptions \ wordforms \ stopwords \ morphology got applied at the different stages on the token processing pipeline and general content got lost there.
Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.