Closed mbatchkarov closed 9 years ago
Agreed. Something is fishy with this prediction...though my suspicion is that ether the default settings of the random forest are not adequate (e.g. depth of trees may be set to 1) or the data set is borked. Am working on a major refactor of TreeBuilder presently, but will have a closer look at this issue this weekend.
Hey, the issue was the data set load...the attribute values were being loaded as strings when they should have been loaded in as Numbers (e.g. Doubles). The default settings are also not very good on that particular problem. Since there are only 4 attributes, using an ignore attribute probability of .7 isn't very effective. The latest release of QuickML has the fix for the load in place.
The example on the quickml website, showing how to train a random forest on the iris data, is broken. Consider the following example:
This outputs:
Prediction: {Iris-virginica=0.3333333333333333, Iris-setosa=0.3333333333333333, Iris-versicolor=0.3333333333333333}
The forest is clearly not learning anything. I've observed the same behaviour with a range of other datasets. I am running the latest git version (commit
7584656f32
)PS I had to remove an empty line at the end of the dataset, otherwise a fourth empty label is created. This is a separate issue and does not affect the bug above. I might open another issue for that.