huawei-noah / streamDM

Stream Data Mining Library for Spark Streaming
http://streamdm.noahlab.com.hk/
Apache License 2.0
492 stars 147 forks source link

StreamDM-84: Updates to Bagging #85

Closed hmgomes closed 6 years ago

hmgomes commented 6 years ago

Summary of the changes

Update the Bagging implementation to use Hoeffding Trees as its base learner. In addition, the default base learner was also set to trees.HoeffdingTrees in Bagging. This makes the Bagging implementation closer to the current default implementation in MOA (see OzaBag.java).

Classes affected by the changes

Bagging

Changed its default learner.

HoeffdingTreeModel

Included the implementation of proba(example: Example): Double as HoeffdingTreeModel now extends ClassificationModel instead of Model. This allows using HoeffdingTree as a base learner for Bagging.

ClassificationModel

Updated the documentation (i.e. changed from Instance to Example)

Tests

  1. Explicitly defining the base learner as the HoeffdingTree.

    • Run:
      ./spark.sh "200 EvaluatePrequential -l (meta.Bagging -l trees.HoeffdingTree) -s (FileReader -f ../data/elecNormNew.arff -k 4532 -d 10 -i 45312) -e (BasicClassificationEvaluator -c -m) -h" 1> result_elec.txt 2> log_elec.log
    • Output: results_elec.txt should contain the classification performance results.
  2. Implicitly using HoeffdingTree as the base learner for Bagging.

    • Run:
      ./spark.sh "200 EvaluatePrequential -l meta.Bagging -s (FileReader -f ../data/elecNormNew.arff -k 4532 -d 10 -i 45312) -e (BasicClassificationEvaluator -c -m) -h" 1> result_elec.txt 2> log_elec.log
    • Output: results_elec.txt should contain the classification performance results.