jekyll / classifier-reborn

A general classifier module to allow Bayesian and other types of classifications. A fork of cardmagic/classifier.
https://jekyll.github.io/classifier-reborn/
GNU Lesser General Public License v2.1
554 stars 110 forks source link

Enable auto_categorize if no initial categories are passed #124

Closed ibnesayeed closed 7 years ago

ibnesayeed commented 7 years ago

I would personally like auto-categorization as the default and preferably the only behavior. This would simplify the API and code base. However, that would be a drastically breaking change. And I do feel that there might be cases where a user might want to limit the categories to a known set. So, I am not suggesting this.

I am suggesting that while keeping the current behavior we add the condition to enable auto_categorize flag if the flag was not passed explicitly during initialization and the initial categories are empty. There is something so convenient and satisfying about having a class that can be instantiated without any arguments. I can create a PR if we agree to go for it.

Ch4s3 commented 7 years ago

auto-categorization as the default and preferably the only behavior.

This would definitely break my usage. :(

I'm not sure I want to make it the default either, as that is a bit of a departure from what's been don since the initial commit.

I am however open to other options for making this part of the api more intuitive. What do you think @ibnesayeed?

ibnesayeed commented 7 years ago

This would definitely break my usage. :(

That's why it was not my suggestion, just a wish.

The suggestion here is just to set auto_categorize to true when no categories are passed at initialization and the flag is not explicitly set to false. Otherwise when one or more initial categories are supplied it should just behave the way it does right now.

Basically, if the categories are not known in advance or if I want to keep my classifier open for new categories as they arrive, I should be able to do this:

require 'classifier-reborn'

classifier = ClassifierReborn::Bayes.new
classifier.train_foo "foo foo foo"
classifier.train_bar "bar bar bar"

Obviously, it is still possible, like this:

classifier = ClassifierReborn::Bayes.new auto_categorize: true

However, it will be very convenient to not pass that explicit argument and intuitively get it working.

So, what is wrong in the existing implementation other than the convenience? Here is what is not very great:

classifier = ClassifierReborn::Bayes.new

The above code will happily initialize the classifier without raising any warnings or errors saying something like No initial categories are provided and the auto categorization is off. However, that classifier will be of no good use as you can not train it with anything.

ibnesayeed commented 7 years ago

PR #128 is safe to merge because it introduces sensible convenience without breaking the backward compatibility.

Because it only taps into the condition when initial categories were empty and auto_categorize flag was not explicitly passed, which was useless situation before so no one would be using in any existing application. Every other condition should behave the same as before.

We do need some test cases though. Any suggestions about what do we want to test?

ibnesayeed commented 7 years ago

Added necessary test cases to cover various possibilities that were not covered already. So, I think it is safe to merge now.