huawei-noah / streamDM

Stream Data Mining Library for Spark Streaming
http://streamdm.noahlab.com.hk/
Apache License 2.0
492 stars 147 forks source link

update Gaussian Estimator and Entropy in SplitCriterion #70

Closed nhnminh closed 7 years ago

nhnminh commented 7 years ago

This pull request has two small changes: update Gaussian Estimator, and Entropy in SplitCriterion

  1. Update Gaussian Estimator:
    • Before : Statistics.normalProbability((splitValue - getMean()) / stdDev())
    • Now: Statistics.normalProbability((splitValue - getMean()) / stdDev()) * weightSum - eqWeight

Reason: Imitate MOA's equation in computing weights for Gaussian Estimator

  1. Entropy in SplitCriterion:
    • Before: It doesn't simply return 0.0 if pre null, sum <=0 or hasNegative = true. It computes entropy anyway. After: Add else clause.

Reason: : Imitate MOA's equation.

Test for these changes:

Run this command to make sure it's runnable: ./spark.sh "EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/electNormNew.arff -k 4000 -d 10)" 1> resu.res 2> log.log