This pull request includes the first version of RandomForest implementation in StreamDM.
It is based on the algorithm defined in this paper, without the drift detector and background learner concepts.
There is a new class addition with this PR: RandomForest.scala, and changes to classes Node.scala and HoeffdingTree.scala.
Tests
All tests use the electNormNew.arff dataset (available in the project /data directory)
The expected output for every test:
100 rows of statistics in the results_*.txt file
This pull request addresses #105
Summary of the changes
This pull request includes the first version of RandomForest implementation in StreamDM. It is based on the algorithm defined in this paper, without the drift detector and background learner concepts. There is a new class addition with this PR: RandomForest.scala, and changes to classes Node.scala and HoeffdingTree.scala.
Tests
All tests use the electNormNew.arff dataset (available in the project /data directory)
The expected output for every test: 100 rows of statistics in the results_*.txt file
number of trees = 10
number of trees = 100
Maximum depth = 5
Node learner = majority vote (-l 0)
m = 2
m = 30% (Percentage (M * (m / 100)))
m = -20%, so actually 80%
m = All
m = more than the amount of available features (-m 60), should default to use all the available features only
m = sqrt(M) + 1, should use the squared root of the total amount of features + 1.