Implementing non-standard metrics

HammondOT commented 2 years ago

Hi,

I'm attempting to implement a non-standard metric to include in cross-validation. I have created a class for it. Sorry for length of exposition, I'm a little unsteady on my feet so want to give good information;

__all__ = ["foo"]

@dataclass
class foo(Metric):
    """Value of foo.
    """

    _name: ClassVar[str] = "foo"
    apply_per_sensitive: ClassVar[bool]= True

    @property
    def name(self) -> str:
      return self._name

    @implements(Metric)
    def score(self, prediction: Prediction, actual: DataTuple) -> float:
        metric
        return foo_score

When I ran it in the normal way,

validator = ml.CrossValidator(ml.LR, hps, folds=5)

primary = ml.Accuracy() #Accuracy
fair_measure = foo() #foo score.

I encountered,

[/usr/local/lib/python3.7/dist-packages/ethicml/evaluators/cross_validator.py](https://localhost:8080/#) in <genexpr>(.0)
    246                 preds = model.run(train_fold, val)
    247                 scores = compute_scores_and_append(experiment, preds, val, i)
--> 248                 score_string = ", ".join(f"{k}={v:.4g}" for k, v in scores.items())
    249                 print(f"fold: {i}, model: '{model.name}', {score_string}, completed!")
    250         return CVResults(compute_scores_and_append.results, self.model)

TypeError: unsupported format string passed to dict.__format__

I did a dirty workaround for this by changing line 248 in cross_validator.py from

score_string = ", ".join(f"{k}={v:.4g}" for k, v in scores.items())
print(f"fold: {i}, model: '{model.name}', {score_string}, completed!")

To,

score_string = ", ".join(map(str, scores.values()))
print(f"fold: {i}, model: '{model.name}', {score_string}, completed!")

Which happily returns the result during the cross-validation, but before it spits out the best result, I encounter

[/usr/lib/python3.7/statistics.py](https://localhost:8080/#) in _exact_ratio(x)
    227         return (x, None)
    228     msg = "can't convert type '{}' to numerator/denominator"
--> 229     raise TypeError(msg.format(type(x).__name__))
    230 
    231 

TypeError: can't convert type 'dict' to numerator/denominator

Which I gather is due to missing dictionary keys(?), but I've tied myself into knots of knots trying to fix it. Ideally I don't want to have to change any source code.

Thanks in advance for any advice

olliethomas commented 2 years ago

Hi. I've taken a look into this, but I can't reproduce the error that you're experiencing. I guess the first thing to check is that you are on the latest version (0.7.1). If you are, can you post a minimum (non-)working example that demonstrates the error? When I plugged your code into the tests for both the metrics and the cross validator, no error occurred (assuming the returned value foo_score is a float), so if there is something that the tests have missed, we will want to get that remedied.

HammondOT commented 2 years ago

The shortest way to put it:

!pip install ethicml==0.7.1 -q
import ethicml as ml
from sklearn.preprocessing import StandardScaler

from dataclasses import dataclass
from typing import ClassVar
from ethicml import DataTuple, Prediction
from ethicml import BaseMetric, Metric

__all__ = ["new_metric"]

@dataclass
class new_metric(Metric):
    """Value of the new metric..
    """

    _name: ClassVar[str] = "new_metric"
    apply_per_sensitive: ClassVar[bool]= True

    @property
    def name(self) -> str:
      return self._name

    @implements(Metric)
    def score(self, prediction: Prediction, actual: DataTuple) -> float:
        samples = ml.metric_per_sensitive_attribute(prediction, actual, ml.TPR())
        metric_score = ml.diff_per_sensitive_attribute(samples)
        return metric_score

dataset=ml.adult(split="Sex")
data=dataset.load()

train, test = ml.train_test_split(data, train_percentage=0.7, random_seed=1)

scaler = StandardScaler()
train.x[dataset.continuous_features] = scaler.fit_transform(train.x[dataset.continuous_features])
test.x[dataset.continuous_features] = scaler.transform(test.x[dataset.continuous_features])

hps = {"C": [10, 2, 1, 0.1, 1e-2]}

validator = ml.CrossValidator(ml.LR, hps, folds=5)
primary = ml.Accuracy()
fair_measure = new_metric()
final = validator.run(train, measures=[primary, fair_measure])
best_result = final.gest_best_in_top_k(primary, fair_measure, top_k=5)

Not the full code, but reproduces the same errors under the same conditions as in the OP

olliethomas commented 2 years ago

I'd encourage you to take a look at the signatures of the functions involved. All functions in EthicML have types specified so this should be easy to look up in both the documentation and the source code. For example, def score(self, prediction: Prediction, actual: DataTuple) -> float: means that the score function takes two inputs, prediction, which it expects to be of type Prediction and actual, which it expects to be of type DataTuple. The -> indicates the return type, which is a float in this instance. Because we have specified that this function returns a float other parts of the code base that call this function expect the return type to be a float. However, python doesn't enforce type checking at runtime, so if you were to return a type that's not a float (in this example), it would still run, but may lead to a runtime error elsewhere.

wearepal / EthicML

Implementing non-standard metrics #611