This change includes the addition of precision, recall, specificity, and fbeta-score, to BasicClassificationEvaluator.scala.
All these metrics are relevant for evaluating imbalanced classification problems.
~It is possible to configure the beta value for fbeta-score in the EvaluatePrequential.scala (parameter -b), the way it is passed to BasicClassificationEvaluator is by using a generic dictionary of parameters. This approach can be used to pass specific parameters to Evaluators.scala descendants without disrupting the general interface.~
The beta hyperparameter has been moved to BasicClassificationEvaluator. By doing that there is no need to include a dictionary of parameters or anything like that. The tests were updated as well.
The current version extends the existing metric calculation in BasicEvaluationPrequential, therefore it is based on a single confusion matrix and so far it is not possible to properly evaluate multi-class problems.
A future adaptation to this BasicClassificationEvaluator should include a way to properly calculate the multi-class versions of the aforementioned metrics (e.g. using macro and micro average).
Another small change within BasicClassificationEvaluator is that the internal representation of the confusion matrix was changed from a (Double, Double, Double, Double) to a Map[String, Double], therefore it is less likely to incorrectly use, for example, fn instead of fp as one have to explicitly state something like x{"fn"} instead of x._1.
Using f1.0-score (default configuration, no need to set -b)
./spark.sh "EvaluatePrequential -l (org.apache.spark.streamdm.classifiers.bayes.MultinomialNaiveBayes) -h" 1> log_f1.txt 2> error.txt
This change includes the addition of precision, recall, specificity, and fbeta-score, to BasicClassificationEvaluator.scala. All these metrics are relevant for evaluating imbalanced classification problems.
~It is possible to configure the beta value for fbeta-score in the EvaluatePrequential.scala (parameter -b), the way it is passed to BasicClassificationEvaluator is by using a generic dictionary of parameters. This approach can be used to pass specific parameters to Evaluators.scala descendants without disrupting the general interface.~ The beta hyperparameter has been moved to BasicClassificationEvaluator. By doing that there is no need to include a dictionary of parameters or anything like that. The tests were updated as well.
The current version extends the existing metric calculation in BasicEvaluationPrequential, therefore it is based on a single confusion matrix and so far it is not possible to properly evaluate multi-class problems.
A future adaptation to this BasicClassificationEvaluator should include a way to properly calculate the multi-class versions of the aforementioned metrics (e.g. using macro and micro average).
Another small change within BasicClassificationEvaluator is that the internal representation of the confusion matrix was changed from a (Double, Double, Double, Double) to a Map[String, Double], therefore it is less likely to incorrectly use, for example, fn instead of fp as one have to explicitly state something like
x{"fn"}
instead ofx._1
.Tests
Using f0.5-score
./spark.sh "EvaluatePrequential -l (org.apache.spark.streamdm.classifiers.bayes.MultinomialNaiveBayes) -e (BasicClassificationEvaluator -b 0.5) -h" 1> log_fbeta_05.txt 2> error.txt
Using f1.0-score (default configuration, no need to set -b)
./spark.sh "EvaluatePrequential -l (org.apache.spark.streamdm.classifiers.bayes.MultinomialNaiveBayes) -h" 1> log_f1.txt 2> error.txt
Using f2-score
./spark.sh "EvaluatePrequential -l (org.apache.spark.streamdm.classifiers.bayes.MultinomialNaiveBayes) -e (BasicClassificationEvaluator -b 2.0) -h" 1> log_fbeta_2.txt 2> error.txt