morinim / vita

Vita - Genetic Programming Framework
Mozilla Public License 2.0
36 stars 6 forks source link

Matthews correlation coefficient #5

Open morinim opened 7 years ago

morinim commented 7 years ago

Accuracy is not useful when the two classes are of very different sizes (for example, if there were 95 cats and only 5 dogs in the data set, the classifier could easily be biased into classifying all the samples as cats. The overall accuracy would be 95%, but in practice the classifier would have a 100% recognition rate for the cat class but a 0% recognition rate for the dog class).

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation.

We can use the averaged table of confusion for multiclass problems (see http://en.wikipedia.org/wiki/Confusion_matrix).