jacobgil / confidenceinterval

The long missing library for python confidence intervals
MIT License
132 stars 14 forks source link

Handling correctly binary classification with default parameters #6

Open bykhov opened 10 months ago

bykhov commented 10 months ago

The following code runs on the data in predictions.csv and uses 3 methods for a recall evaluation:

The results are different between sklearn and confidenceinterval. Is there any explanation for this effect?

import pandas as pd
import confidenceinterval
from sklearn.metrics import recall_score, confusion_matrix
df = pd.read_csv('predictions.csv')
y_true = df['true'].values.astype(bool)
y_pred = (df['pred'].values > 0.5).astype(bool)
# using confidenceinterval
re1 = confidenceinterval.recall_score(y_true, y_pred)[0]
# using sklearn
re2 = recall_score(y_true, y_pred)
# direct calculation from confusion matrix
conf_mat = confusion_matrix(y_true, y_pred)
re3 = conf_mat[1, 1] / (conf_mat[1, 1] + conf_mat[1, 0])print(f'{re1:.4f}, {re2:.4f}, {re3:.4f}')

Results are: 0.7789, 0.7820, 0.7820

bykhov commented 10 months ago

It works the same only if defined explicitly:

confidenceinterval.recall_score(y_true, y_pred,average='binary',method='wilson')[0]