There are two changes in this commit for function bestSplit, class NominalFeatureClassObserver, file FeatureClassObserver.scala1. Change the logic:
Before: do the binarySplit first, then check if (binarySplit = false), create MultiwaySplit later.
Now: check if (binarySplit = false) first, create MultiwaySplit; then do the binarySplit later.
Reason: Imitate logical structure of MOA in computing NominalFeatures: if binaryOnly is not turned on, then do the multiwaySplit first.
2. Iteration:
Before: Loop until pre.length (size of the pre-split distribution, which equals to the number of classes.
After: Loop until numFeatureValues (number of values of that Nominal Feature)
Reason:
This is what MultiwaySplit should do:
For each value of NominalFeature, compute the merit.
Compare all merits to select the best split.
After having these two changes, NominalFeatureObserver could result in the same output (bestSplit with highest merit) for that Nominal Feature, as what MOA does.
To test if this change is still runnable, we could run the following command:
There are two changes in this commit for function bestSplit, class NominalFeatureClassObserver, file FeatureClassObserver.scala 1. Change the logic:
Reason: Imitate logical structure of MOA in computing NominalFeatures: if binaryOnly is not turned on, then do the multiwaySplit first.
2. Iteration:
Reason: This is what MultiwaySplit should do:
After having these two changes, NominalFeatureObserver could result in the same output (bestSplit with highest merit) for that Nominal Feature, as what MOA does.
To test if this change is still runnable, we could run the following command:
./spark.sh "EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200) -s (FileReader -f ../data/randomtreesampledata -k 10 -d 10)"> result.res 2> log.log --