huawei-noah / streamDM

Stream Data Mining Library for Spark Streaming
http://streamdm.noahlab.com.hk/
Apache License 2.0
492 stars 147 forks source link

Changes in Statistics.scala to correctly compute statistics #66

Closed nhnminh closed 7 years ago

nhnminh commented 7 years ago

Mistakes in polevl (to compute Normal Probability; Statistics.scala) : loop until N, instead of N+1.

This is to assure that StreamDM computes the exact Normal Distribution as in MOA. This Normal Distribution is used for Gaussian Estimation to compute the binarySplit in Numeric Features. (FeatureClassObserver.scala).

To make sure this change runnable, we could run the following test: ./spark.sh "EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200) -s (FileReader -f ../data/iris.arff -k 1000 -d 10)" 1> result.res 2> log.log