huawei-noah / streamDM

Stream Data Mining Library for Spark Streaming
http://streamdm.noahlab.com.hk/
Apache License 2.0
492 stars 147 forks source link

StreamDM-86: Adding runtime estimation to BasicClassificationEvaluator #87

Closed hmgomes closed 6 years ago

hmgomes commented 6 years ago

Summary of the changes

This pull request addresses #86.

BasicClassificationEvaluator

Added the column Runtime to the output. Currently using System.nanoTime() for a rough estimation of runtime.

HoeffdingTree

Changed the output of a log message (now using logInfo()).

Tests

Using datasets elecNormNew.arff and covtypeNorm.arff.

  1. Verify output of a binary classification problem (elecNormNew)

    • Run: ./spark.sh "200 EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o)-s (FileReader -f ../data/elecNormNew.arff -k 454 -d 10 -i 45312) -e (BasicClassificationEvaluator -c -m) -h" 1> results_binary.txt 2> log_binary.log

    • Output: results_binary.txt should contain the classification performance results, including the Runtime column.

  2. Explicitly defining the base learner as the HoeffdingTree.

    • Run: ./spark.sh "200 EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/covtypeNorm.arff -k 5810 -d 10 -i 581012) -e (BasicClassificationEvaluator -c -m) -h" 1> results_multiclass.txt 2> log_multiclass.log

    • Output: results_multiclass.txt should contain the classification performance results, including the Runtime column.