huawei-noah / streamDM

Stream Data Mining Library for Spark Streaming
http://streamdm.noahlab.com.hk/
Apache License 2.0
492 stars 147 forks source link

Update numeric features #68

Closed nhnminh closed 7 years ago

nhnminh commented 7 years ago

This commit has 2 changes in class GaussianNumericFeatureClassObserver, 1 change in trait FeatureClassObserver, in file FeatureClassObserver.scala, and add 1 dataset (electNormNew.arff)

##############

  1. [Important] Change in function bestSplit, class GaussianNumericFeatureClassObserver:

Content:

Reason:

To verify this, you could run:

./spark.sh "EvaluatePrequential  -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/electNormNew.arff -k 4000 -d 10)" 1> result.res 2> log.log

A small notice, to make the HoeffdingTree grow, you need to turn on "growthAllowed" parameter, which is flagged by -o.

#############

  1. Change in function splitPoints:

Content:

Reason: imitate MOA's function for finding split points.

##############

  1. Change in trait FeatureClassObserver:

Content:

Reason:

##############

  1. Add dataset "Electricity" (electNormNew.arff):

Reason:

To make a test for all those changes, simply use this command:

./spark.sh "EvaluatePrequential  -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/electNormNew.arff -k 4000 -d 10)" 1> result.res 2> log.log