This commit has 2 changes in class GaussianNumericFeatureClassObserver, 1 change in trait FeatureClassObserver, in file FeatureClassObserver.scala, and add 1 dataset (electNormNew.arff)
##############
[Important] Change in function bestSplit, class GaussianNumericFeatureClassObserver:
Content:
Add a condition check : whether the SplitPoints is null. If null, return a not-null FeatureSplit with negative infinity merit.
Reason:
Before, StreamDM couldn't work with dataset which has numeric features having max=min (only having 1 value).
For example, in dataset Electricity, numeric feature "vicprice" and "vicdemand" have the only value 0.003467 and 0.422915 respectively.
For those features, current implementation could not find the split points, then it will throw a Null Pointer Exception, which makes the program not runnable with this dataset.
Same issue happening with dataset KDD99.
After being modified as mentioned above, StreamDM could work well with dataset Electricity, and the same kind of dataset, too. It will throw a warning in log file that there exists null-split-point features.
This commit has 2 changes in class GaussianNumericFeatureClassObserver, 1 change in trait FeatureClassObserver, in file FeatureClassObserver.scala, and add 1 dataset (electNormNew.arff)
##############
Content:
Reason:
Before, StreamDM couldn't work with dataset which has numeric features having max=min (only having 1 value). For example, in dataset Electricity, numeric feature "vicprice" and "vicdemand" have the only value 0.003467 and 0.422915 respectively. For those features, current implementation could not find the split points, then it will throw a Null Pointer Exception, which makes the program not runnable with this dataset. Same issue happening with dataset KDD99.
After being modified as mentioned above, StreamDM could work well with dataset Electricity, and the same kind of dataset, too. It will throw a warning in log file that there exists null-split-point features.
To verify this, you could run:
./spark.sh "EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/electNormNew.arff -k 4000 -d 10)" 1> result.res 2> log.log
A small notice, to make the HoeffdingTree grow, you need to turn on "growthAllowed" parameter, which is flagged by -o.
#############
Content:
Reason: imitate MOA's function for finding split points.
##############
Content:
Reason:
##############
Reason:
To make a test for all those changes, simply use this command:
./spark.sh "EvaluatePrequential -l (trees.HoeffdingTree -l 0 -t 0.05 -g 200 -o) -s (FileReader -f ../data/electNormNew.arff -k 4000 -d 10)" 1> result.res 2> log.log