sanity / quickml

A fast and easy to use decision tree learner in java
http://quickml.org/
GNU Lesser General Public License v3.0
231 stars 54 forks source link

RandomDecisionForest results are very different from Weka #131

Closed bardhlohaj closed 8 years ago

bardhlohaj commented 8 years ago

I implemented RandomDecisionForest and the results that I'm getting (using getProbability on RandomDecisionForest) are totally different from the results that I get on Weka(http://www.cs.waikato.ac.nz/ml/weka/).

Can you inform me if there is an issue on the RandomDecisionForest implementation or any specific way of how I should set the parameters of the DecisionTreeBuilder and RandomDecisionForest so I can get the same results as on Weka?

sanity commented 8 years ago

Can you elaborate on what you mean by "very different"?

Have you assessed the predictive accuracy of the results relative to Weka (eg. using RMSE)?

athawk81 commented 8 years ago

bardhlohaj, the discrepancy is most certainly coming from different default settings and perhaps slight algorithmic variations between our package and Weka. RandomDecisionForestBuilder takes a DecisionTreeBuiler as an argument. The defaults hyperparameters for DecisionTreeBuilder are very simple (and probably inappropriate for most data sets)...but they are exceptionally flexible (more so than Weks'a) and automatically tunable with the PredictiveModelOptimizer.