huawei-noah / streamDM

Stream Data Mining Library for Spark Streaming
http://streamdm.noahlab.com.hk/
Apache License 2.0
492 stars 147 forks source link

Update NominalFeatureObserver #67

Closed nhnminh closed 7 years ago

nhnminh commented 7 years ago

There are two changes in this commit for function bestSplit, class NominalFeatureClassObserver, file FeatureClassObserver.scala 1. Change the logic:

Reason: Imitate logical structure of MOA in computing NominalFeatures: if binaryOnly is not turned on, then do the multiwaySplit first.

2. Iteration:

Reason: This is what MultiwaySplit should do:


After having these two changes, NominalFeatureObserver could result in the same output (bestSplit with highest merit) for that Nominal Feature, as what MOA does.


To test if this change is still runnable, we could run the following command:

./spark.sh "EvaluatePrequential  -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200) -s (FileReader -f ../data/randomtreesampledata -k 10 -d 10)"> result.res 2> log.log --