Closed duchesnay closed 1 year ago
thx @duchesnay but we cannot just change the way things are computed. This will silently change numbers for users/challenges. I would add a parameter to BalancedAccuracy scorer to choose between the 2 definitions and keep the default to the old one so it does not break anything.
like this maybe:
class BalancedAccuracy(ClassifierBaseScoreType):
is_lower_the_better = False
minimum = 0.0
maximum = 1.0
def __init__(self, name='balanced_accuracy', precision=2, default=False):
self.name = name
self.precision = precision
self.default = default
def __call__(self, y_true_label_index, y_pred_label_index):
if self.default:
score = balanced_accuracy_score(y_true_label_index, y_pred_label_index)
else:
mac = MacroAveragedRecall()
tpr = mac(y_true_label_index, y_pred_label_index)
base_tpr = 1. / len(self.label_names)
score = (tpr - base_tpr) / (1 - base_tpr)
return score
@duchesnay @agramfort
I am not a fan of "default". Can we be more explicit? Maybe macro_average=True
or False
? @duchesnay you know better what makes the 2 variants different.
I agree that default
is not necessarily the best parameter name. What about sklearn_bacc
which would be False
if we want to keep the current score type ?
Another solution would be to leave this as is and implement a custom score_type
(with sklearn bacc) in problem.py
for this challenge.
sklearn_bacc does not say either what it does. I would prefer a name that says what is the conceptual difference. Let's avoid a number object here
Message ID: @.*** com>
Ok since the RAMP implementation is an adjusted balanced_accuracy between 1/(1-nclasses) and 1 (instead of 0 and 1 for balanced_accuracy_score from sklearn, and in fact if adjusted=True
in balanced_accuracy_score, we also obtain bacc between 1/(1-nclasses) and 1),
here is a rewritten version of BalancedAccuracy
:
class BalancedAccuracy(ClassifierBaseScoreType):
is_lower_the_better = False
minimum = 0.0
maximum = 1.0
def __init__(self, name='balanced_accuracy', precision=2, adjusted=True):
self.name = name
self.precision = precision
self.adjusted = adjusted
def __call__(self, y_true_label_index, y_pred_label_index):
When adjusted=True, it will use an adjusted balanced_accuracy_score which is calculated by first using the
MacroAveragedRecall class to calculate the macro-averaged recall score (i.e. the unweighted mean of the recall
scores of all classes), and then subtracting the base true positive rate (i.e. the chance recall) from this value,
and dividing the result by (1 - base true positive rate). Score will then be between 1/(1-nclasses) and 1.
When adjusted=False, it will use the balanced_accuracy_score from sklearn.
```
if self.adjusted:
mac = MacroAveragedRecall()
tpr = mac(y_true_label_index, y_pred_label_index)
base_tpr = 1. / len(self.label_names)
score = (tpr - base_tpr) / (1 - base_tpr)
else:
score = balanced_accuracy_score(y_true_label_index, y_pred_label_index)
return score
Or a simpler solution with no need of MacroAveragedRecall class:
class BalancedAccuracy(ClassifierBaseScoreType): is_lower_the_better = False minimum = 0.0 maximum = 1.0
def __init__(self, name='balanced_accuracy', precision=2, adjusted=True):
self.name = name
self.precision = precision
self.adjusted = adjusted
def __call__(self, y_true_label_index, y_pred_label_index):
```
When adjusted=True, it will use an adjusted balanced_accuracy_score from sklearn which is calculated by
subtracting the base true positive rate (i.e. the chance recall) from the macro averaged recall,
and dividing the result by (1 - base true positive rate). Score will then be between 1/(1-nclasses) and 1.
When adjusted=False, it will use the non-adjusted balanced_accuracy_score from sklearn.
```
if self.adjusted:
score = balanced_accuracy_score(y_true_label_index, y_pred_label_index, adjusted=True)
else:
score = balanced_accuracy_score(y_true_label_index, y_pred_label_index)
return score
Minimum is not really 0 though...
@frcaud can you push directly in this PR?
I would like to push on duchesnay/ramp-workflow in order to have a commit appearing here if possible @duchesnay If Edouard could give me the permission please.
@frcaud @agramfort
Like in sklearn, I suggest to use the parameter adjusted
. Copy paste from sklearn doc:
sklearn.metrics.balanced_accuracy_score(y_true, y_pred, *, sample_weight=None, adjusted=False)
[...]
adjusted : bool, default=False
When true, the result is adjusted for chance, so that random
performance would score 0, while keeping perfect performance at a score
of 1.
As in sklearn I suggest to use False
as default value. Indeed adjusted bacc is not what is usually expected.
Example showing that ramp actually use unexpected adjusted bacc:
import numpy as np
from sklearn.metrics import balanced_accuracy_score
from rampwf.score_types import BalancedAccuracy
y_true = [0, 1, 0, 0, 1, 0]
y_pred = [0, 1, 0, 0, 0, 1]
bac_sklearn = balanced_accuracy_score(y_true, y_pred)
bac_sklearn_adj = balanced_accuracy_score(y_true, y_pred, adjusted=True)
bac = BalancedAccuracy()
bac.label_names = np.unique(y_true)
bac_rampwf = bac(y_true, y_pred)
print(bac_rampwf, bac_sklearn_adj, bac_sklearn)
# 0.25 0.25 0.625
Thanks @duchesnay . We did indeed implement the new solution with adjusted as you suggest.
As for the default value, maybe it is better to leave True
as default because of all the challenges that has already been deployed with this metric.
Patch coverage: 100.00
% and project coverage change: -0.01
:warning:
Comparison is base (
2d349a8
) 80.82% compared to head (969da89
) 80.81%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Simply use the Balanced accuracy as defined in sklearn as defined in sklearn: The macro average of recalls obtained on each class. Indeed the problem was that ramp-workflow provided a different results than sklearn:
import numpy as np from sklearn.metrics import balanced_accuracy_score from sklearn.metrics import recall_score from rampwf.score_types import BalancedAccuracy
y_true = [0, 1, 0, 0, 1, 0] y_pred = [0, 1, 0, 0, 0, 1] bac_sklearn = balanced_accuracy_score(y_true, y_pred)
bac = BalancedAccuracy() bac.label_names = np.unique(y_true) bac_rampwf = bac(y_true, y_pred)
print(bac_rampwf, bac_sklearn)
0.25 0.625