Open AdamBajger opened 7 months ago
Hello, thanks for sharing.
The difference is that recall default is 'micro' averaging. If you pass average='binary', it will call tpr_score by default: https://github.com/jacobgil/confidenceinterval/blob/main/confidenceinterval/takahashi_methods.py#L374
Recall is equivalent to tpr only in 'binary' averaging. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html
I admit that this isn't clear enough from the readme. A better documentation could help here a lot.
Sensitivity, aka true positive rate, should be calculated consistently accross the library. I can understand that there will be slight differences when using bootstrap methods for calculating the confidence intervals, but not inconsistency like the one in this minimal working example:
I have looked into the source code and identified several inconsistencies in the docstrincs, where the terms "sensitivity" and "specificity" were mixed arbitrarily, pointing at unchecked copypasting of code and that's where the error originates. I have no clue where the error lies, though.