roife / emt

Emacs macOS Tokenizer, tokenizing CJK words with macOS's built-in NLP tokenizer.
GNU General Public License v3.0
30 stars 2 forks source link

How to customize the tokenizer default behavior? #1

Open sincebyte opened 3 months ago

sincebyte commented 3 months ago

Excellent work! It seems that NLTokenizer split text but ignored the sign char . example:emt-lib-path,the out put is emt, -lib ,-path but expect is emt ,-,lib,-,path Did there have any method to change the default behavior?

roife commented 3 months ago

It seems that NLTokenizer does not provide the related functionality (See https://developer.apple.com/documentation/naturallanguage/nltokenizer), so we need to handle it manually.

Actually, for English characters (or more precisely, ASCII characters), emt will call Emacs' own forward-word/backward-word.