jekyll / classifier-reborn

A general classifier module to allow Bayesian and other types of classifications. A fork of cardmagic/classifier.
https://jekyll.github.io/classifier-reborn/
GNU Lesser General Public License v2.1
553 stars 110 forks source link

In some languages like Chinese, a word of length not bigger than 2 is very common, so I suppose this is a very strong(sometimes wrong in other languages) assumption. #176

Open Christophy opened 6 years ago

Christophy commented 6 years ago

https://github.com/jekyll/classifier-reborn/blob/4e807496e69cbb33ce2663564ef287f167915879/lib/classifier-reborn/extensions/hasher.rb#L30

Ch4s3 commented 6 years ago

We could probably make this configurable. I’ll happily review a PR for this.

Ch4s3 commented 6 years ago

@Christophy We just merged https://github.com/jekyll/classifier-reborn/pull/162, which allows for custom tokenizers. Could you let us know if this helps?