scikit-learn-contrib / imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://imbalanced-learn.org
MIT License
6.85k stars 1.29k forks source link

[BUG] geometric_mean_score with average='macro' #1096

Open vocarvalho opened 1 month ago

vocarvalho commented 1 month ago

Hello, first of all thank you for the package. I believe there is an error in the calculation of the geometric_mean_score measure when set to the average='macro' option. When the problem is multiclass it works correctly (see below).

##################################################
#multiclass
from imblearn.metrics import geometric_mean_score
from sklearn.metrics import recall_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 1, 2]

#for each label
print('-----------------')
print('correct: ',geometric_mean_score(y_true, y_pred, average=None))

#macro
print('-----------------')
vet = geometric_mean_score(y_true, y_pred, average=None)
print('correct: ',np.mean(vet))
print('correct: ',geometric_mean_score(y_true, y_pred, average='macro'))

#Answers
#-----------------
#correct:  [1.         0.61237244 0.61237244]
#-----------------
#correct:  0.7415816237971963
#correct:  0.7453559924999299
##################################################

However, when the problem is binary, it works incorrectly (see below). What I think it is doing is computing the g-mean from the TPR, TNR macros (example in the code).

##################################################
#binary
from imblearn.metrics import geometric_mean_score
from sklearn.metrics import recall_score

y_true = [0, 0, 1, 0, 1, 1]
y_pred = [0, 0, 0, 0, 0, 1]

#for each label
print('-----------------')
print('correct: ',geometric_mean_score(y_true, y_pred, average=None))

#macro: wrong, it should be the average of the scores, i.e., np.mean(vet)
print('-----------------')
vet = geometric_mean_score(y_true, y_pred, average=None)
print('correct: ',np.mean(vet))
#wrong
print('incorrect: ',geometric_mean_score(y_true, y_pred, average='macro'))

#what i think he is doing...computing the g-mean from the macros of TPR, TNR
print('-----------------')
#class 0 as the interesting class
TPR_0 = sensitivity = recall_score(y_true, y_pred, average='binary', pos_label=0)
TNR_1 = specificity = recall_score(y_true, y_pred, average='binary', pos_label=1)

#class 1 as the interesting class
TPR_1 = sensitivity = recall_score(y_true, y_pred, average='binary', pos_label=1)
TNR_0 = specificity = recall_score(y_true, y_pred, average='binary', pos_label=0)

macro_TPR = (TPR_0 + TPR_1)/2
macro_TPR = (TNR_0 + TNR_1)/2
print(macro_TPR)
print(macro_TPR)

gmean = np.sqrt(macro_TPR * macro_TPR)
print('incorrect: ',gmean)

#Answers
#-----------------
#correct:  [0.57735027 0.57735027]
#-----------------
#correct:  0.5773502691896257
#incorrect:  0.6666666666666666
#-----------------
#0.6666666666666666
#0.6666666666666666
#incorrect:  0.6666666666666666
##################################################

I would like to hear back if my reasoning is correct. I hope I have helped in some way. Regards.