Closed Citronelol closed 6 years ago
It appears to be an preference issue, it's matching both 協 and 会 as 接尾 (suffix) before the whole word.
Perhaps the matching algorithm needs to favor longer tokens before splitting into finer matches.
@Citronelol I released fixed version of 0.1.2, and deployed the demo site https://takuyaa.github.io/kuromoji.js/demo/tokenize.html FYI @DJTB
Thanks a lot !
Hi,
I was trying to tokenize the following sentence :
第1条 この法人は、一般社団法人国際銀行協会(以下「本協会」という。)と称し、英文では、 International Bankers Association of Japanと記載する。
and the results are different when using the java version of kuromojin (with Ipadic dictionary) and the tokenizer provided by kuromoji.js. In particular, the following sequence 協会 is splitted in kuromoji.js.
I saw a closed issue (#16) stating this could due to the Viterbi version of the tokenizer. Is there a way to disable it ?
Many thanks in advance,
Best