huawei-noah / streamDM

Stream Data Mining Library for Spark Streaming
http://streamdm.noahlab.com.hk/
Apache License 2.0
492 stars 147 forks source link

Adding the calculation for precision, recall, fbeta-score, ... #69

Closed hmgomes closed 7 years ago

hmgomes commented 7 years ago

This change includes the addition of precision, recall, specificity, and fbeta-score, to BasicClassificationEvaluator.scala. All these metrics are relevant for evaluating imbalanced classification problems.

~It is possible to configure the beta value for fbeta-score in the EvaluatePrequential.scala (parameter -b), the way it is passed to BasicClassificationEvaluator is by using a generic dictionary of parameters. This approach can be used to pass specific parameters to Evaluators.scala descendants without disrupting the general interface.~ The beta hyperparameter has been moved to BasicClassificationEvaluator. By doing that there is no need to include a dictionary of parameters or anything like that. The tests were updated as well.

The current version extends the existing metric calculation in BasicEvaluationPrequential, therefore it is based on a single confusion matrix and so far it is not possible to properly evaluate multi-class problems.

A future adaptation to this BasicClassificationEvaluator should include a way to properly calculate the multi-class versions of the aforementioned metrics (e.g. using macro and micro average).

Another small change within BasicClassificationEvaluator is that the internal representation of the confusion matrix was changed from a (Double, Double, Double, Double) to a Map[String, Double], therefore it is less likely to incorrectly use, for example, fn instead of fp as one have to explicitly state something like x{"fn"} instead of x._1.

Tests