Closed mbatchkarov closed 8 years ago
@athawk81 said he would investigate this, but mentioned that with a .7 skip attribute probability, but only 4 attributes, it wouldn't be surprising if this led to some stumpy trees, so this behavior might be expected.
the ignoreAttributeProbability is set to .7 by default. Given that there are only 4 attributes, the odds that all attributes will be ignored (and tree building will cease along that branch) are good. My guess is that you are probably getting shallow trees and as a consequence are just predicting the most common class for all training instances. How about try setting it to 0 or .2 and see what happens? I'll do this myself tomorrow if i don't here back from you.
I've set that probability to 0-- see line 22 in the commit. Is there another parameter/field that needs to be set? If there is, the docs need an update.
Hey mbatchkarov, apologies for the delay. Please see the the test class quickml.supervised.classifier.randomForest.TestIrisAccuracy
on the most recent version of master. I created a random forest that gave different classifications for the different instances.
Hey, sorry it has taken me so long to get back to you. I still think the issue has not been resolved. In the first instance, please merge this pull request. In particular, I am interested in the last bit I've added, which calculates accuracy on the training set (somewhat naively). Do make any changes that you feel are appropriate, e.g.
quickml
tweak the settings of the random forest builder.
As it stands TestIrisAccuracy
is just a smoke test and doesn't have anything to do with accuracy. I'd like to TestIrisAccuracy
to be a self-contained example of how one gets good performance for the iris data as well as a test that quickml
can do that.
This is a test to demonstrate #116 has not been resolved in 9ec1c3ff24e0d9291b27c1ac563e0.
I've deliberately increased the maximum depth and number a trees in your to make sure the classifier will overfit. Despite this, it predicts
versicolor
for all instances in the training set.