duchesnay commented 1 year ago

Simply use the Balanced accuracy as defined in sklearn as defined in sklearn: The macro average of recalls obtained on each class. Indeed the problem was that ramp-workflow provided a different results than sklearn:

import numpy as np from sklearn.metrics import balanced_accuracy_score from sklearn.metrics import recall_score from rampwf.score_types import BalancedAccuracy

y_true = [0, 1, 0, 0, 1, 0] y_pred = [0, 1, 0, 0, 0, 1] bac_sklearn = balanced_accuracy_score(y_true, y_pred)

bac = BalancedAccuracy() bac.label_names = np.unique(y_true) bac_rampwf = bac(y_true, y_pred)

print(bac_rampwf, bac_sklearn)

0.25 0.625

agramfort commented 1 year ago

thx @duchesnay but we cannot just change the way things are computed. This will silently change numbers for users/challenges. I would add a parameter to BalancedAccuracy scorer to choose between the 2 definitions and keep the default to the old one so it does not break anything.

frcaud commented 1 year ago

like this maybe:

class BalancedAccuracy(ClassifierBaseScoreType):
    is_lower_the_better = False
    minimum = 0.0
    maximum = 1.0

    def __init__(self, name='balanced_accuracy', precision=2, default=False):
        self.name = name
        self.precision = precision
        self.default = default

    def __call__(self, y_true_label_index, y_pred_label_index):
        if self.default:
            score = balanced_accuracy_score(y_true_label_index, y_pred_label_index)
        else:
            mac = MacroAveragedRecall()
            tpr = mac(y_true_label_index, y_pred_label_index)
            base_tpr = 1. / len(self.label_names)
            score = (tpr - base_tpr) / (1 - base_tpr)
        return score

frcaud commented 1 year ago

@duchesnay @agramfort

agramfort commented 1 year ago

I am not a fan of "default". Can we be more explicit? Maybe macro_average=True or False? @duchesnay you know better what makes the 2 variants different.

frcaud commented 1 year ago

I agree that default is not necessarily the best parameter name. What about sklearn_bacc which would be False if we want to keep the current score type ?

Another solution would be to leave this as is and implement a custom score_type (with sklearn bacc) in problem.py for this challenge.

agramfort commented 1 year ago

sklearn_bacc does not say either what it does. I would prefer a name that says what is the conceptual difference. Let's avoid a number object here

Message ID: @.*** com>

frcaud commented 1 year ago

Ok since the RAMP implementation is an adjusted balanced_accuracy between 1/(1-nclasses) and 1 (instead of 0 and 1 for balanced_accuracy_score from sklearn, and in fact if adjusted=True in balanced_accuracy_score, we also obtain bacc between 1/(1-nclasses) and 1), here is a rewritten version of BalancedAccuracy:

class BalancedAccuracy(ClassifierBaseScoreType):
    is_lower_the_better = False
    minimum = 0.0
    maximum = 1.0

    def __init__(self, name='balanced_accuracy', precision=2, adjusted=True):
        self.name = name
        self.precision = precision
        self.adjusted = adjusted

    def __call__(self, y_true_label_index, y_pred_label_index):

    When adjusted=True, it will use an adjusted balanced_accuracy_score which is calculated by first using the 
    MacroAveragedRecall class to calculate the macro-averaged recall score (i.e. the unweighted mean of the recall 
    scores of all classes), and then subtracting the base true positive rate (i.e. the chance recall) from this value, 
    and dividing the result by (1 - base true positive rate). Score will then be between 1/(1-nclasses) and 1.
    When adjusted=False, it will use the balanced_accuracy_score from sklearn.
    ```
    if self.adjusted:
        mac = MacroAveragedRecall()
        tpr = mac(y_true_label_index, y_pred_label_index)
        base_tpr = 1. / len(self.label_names)
        score = (tpr - base_tpr) / (1 - base_tpr)
    else:            
        score = balanced_accuracy_score(y_true_label_index, y_pred_label_index)
    return score


Or a simpler solution with no need of MacroAveragedRecall class:

class BalancedAccuracy(ClassifierBaseScoreType): is_lower_the_better = False minimum = 0.0 maximum = 1.0

def __init__(self, name='balanced_accuracy', precision=2, adjusted=True):
    self.name = name
    self.precision = precision
    self.adjusted = adjusted

def __call__(self, y_true_label_index, y_pred_label_index):
    ```
    When adjusted=True, it will use an adjusted balanced_accuracy_score from sklearn which is calculated by 
    subtracting the base true positive rate (i.e. the chance recall) from the macro averaged recall, 
    and dividing the result by (1 - base true positive rate). Score will then be between 1/(1-nclasses) and 1.
    When adjusted=False, it will use the non-adjusted balanced_accuracy_score from sklearn.
    ```
    if self.adjusted:
        score = balanced_accuracy_score(y_true_label_index, y_pred_label_index, adjusted=True)
    else:            
        score = balanced_accuracy_score(y_true_label_index, y_pred_label_index)
    return score

frcaud commented 1 year ago

Minimum is not really 0 though...

agramfort commented 1 year ago

@frcaud can you push directly in this PR?

frcaud commented 1 year ago

I would like to push on duchesnay/ramp-workflow in order to have a commit appearing here if possible @duchesnay If Edouard could give me the permission please.

duchesnay commented 1 year ago

@frcaud @agramfort Like in sklearn, I suggest to use the parameter adjusted . Copy paste from sklearn doc:

sklearn.metrics.balanced_accuracy_score(y_true, y_pred, *, sample_weight=None, adjusted=False)
[...]
adjusted : bool, default=False
        When true, the result is adjusted for chance, so that random
        performance would score 0, while keeping perfect performance at a score
        of 1.

As in sklearn I suggest to use False as default value. Indeed adjusted bacc is not what is usually expected.

Example showing that ramp actually use unexpected adjusted bacc:

import numpy as np

from sklearn.metrics import balanced_accuracy_score
from rampwf.score_types import BalancedAccuracy

y_true = [0, 1, 0, 0, 1, 0]
y_pred = [0, 1, 0, 0, 0, 1]
bac_sklearn = balanced_accuracy_score(y_true, y_pred)
bac_sklearn_adj = balanced_accuracy_score(y_true, y_pred, adjusted=True)

bac = BalancedAccuracy()
bac.label_names = np.unique(y_true)
bac_rampwf = bac(y_true, y_pred)

print(bac_rampwf, bac_sklearn_adj, bac_sklearn)
# 0.25 0.25 0.625

frcaud commented 1 year ago

Thanks @duchesnay . We did indeed implement the new solution with adjusted as you suggest. As for the default value, maybe it is better to leave True as default because of all the challenges that has already been deployed with this metric.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01 :warning:

Comparison is base (2d349a8) 80.82% compared to head (969da89) 80.81%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #327 +/- ## ========================================== - Coverage 80.82% 80.81% -0.01% ========================================== Files 137 137 Lines 4948 4946 -2 ========================================== - Hits 3999 3997 -2 Misses 949 949 ``` | [Impacted Files](https://codecov.io/gh/paris-saclay-cds/ramp-workflow/pull/327?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paris-saclay-cds) | Coverage Δ | | |---|---|---| | [rampwf/score\_types/balanced\_accuracy.py](https://codecov.io/gh/paris-saclay-cds/ramp-workflow/pull/327?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paris-saclay-cds#diff-cmFtcHdmL3Njb3JlX3R5cGVzL2JhbGFuY2VkX2FjY3VyYWN5LnB5) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paris-saclay-cds). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=paris-saclay-cds)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

paris-saclay-cds / ramp-workflow

FIX: Balanced accuracy as defined in sklearn #327

0.25 0.625

Codecov Report