Open bykhov opened 10 months ago
The following code runs on the data in predictions.csv and uses 3 methods for a recall evaluation:
recall
recall_score
sklearn.metrics
confidenceinterval
The results are different between sklearn and confidenceinterval. Is there any explanation for this effect?
sklearn
import pandas as pd import confidenceinterval from sklearn.metrics import recall_score, confusion_matrix df = pd.read_csv('predictions.csv') y_true = df['true'].values.astype(bool) y_pred = (df['pred'].values > 0.5).astype(bool) # using confidenceinterval re1 = confidenceinterval.recall_score(y_true, y_pred)[0] # using sklearn re2 = recall_score(y_true, y_pred) # direct calculation from confusion matrix conf_mat = confusion_matrix(y_true, y_pred) re3 = conf_mat[1, 1] / (conf_mat[1, 1] + conf_mat[1, 0])print(f'{re1:.4f}, {re2:.4f}, {re3:.4f}')
Results are: 0.7789, 0.7820, 0.7820
It works the same only if defined explicitly:
confidenceinterval.recall_score(y_true, y_pred,average='binary',method='wilson')[0]
The following code runs on the data in predictions.csv and uses 3 methods for a
recall
evaluation:recall_score
fromsklearn.metrics
recall_score
fromconfidenceinterval
The results are different between
sklearn
andconfidenceinterval
. Is there any explanation for this effect?Results are: 0.7789, 0.7820, 0.7820