mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3.03k stars 404 forks source link

Add fairness #612

Closed pplonski closed 1 year ago

pplonski commented 1 year ago

Implement AutoML fairness based on https://arxiv.org/abs/2111.06495 by @qingyun-wu and @sonichi

Requirements:

Example code

X, y = load_training_data()

# init AutoML
automl = AutoML()

# case 1) training with sensitive attributes, use default fairness_metric
automl.fit(X, y, sensitive_features=["feature1", "feature2"])

# case 2) training with sensitive attributes and select fairness_metric
automl.fit(X, y, sensitive_features=["feature1", "feature2"], fairness_metric="equalized_odds")

# case 3) training with sensitive attributes and set custom fairness_metric
def custom fairness_metric(y_true, y_pred, sensitive_features, sample_weight=None):
    # implementation

automl.fit(X, y, sensitive_features=["feature1", "feature2"], fairness_metric=custom_fairness_metric)
codeboy5 commented 1 year ago

Hi @pplonski, I would be interested in contributing to this. Is there any way I can help ?

pplonski commented 1 year ago

Hi @codeboy5!

Thank you for your offer! I started to study FairAutoML more, I don't like the approach from the paper. It looks good in theory, but for real-life problems might be unusable. Just imagine mitigating unfairness when doing 10-fold Cross-Validation. I think that applying Exponentiated Gradient to each model in each fold might be very inefficient. I found that Exponentiated Griadent has trouble optimizing for more than 1 sensitive feature, for example, if you have 2 sensitive features (A and B), then a mitigated model might be fair for feature A but unfair for feature B...

I would like to have a method that will search for a sample weight that will provide fairness. Then I would like to reuse the same sample weight when doing hyperparameters search.

So I'm in the process of searching for a method for fair-optimal same weighting...

@codeboy5 do you have experience in fair ML or in optimization theory?

pplonski commented 1 year ago

I created a fairness module. It can compute fairness metrics and plots for binary classification tasks. It compute statistics for every sensitive feature separately.

Link to the module: https://github.com/mljar/mljar-supervised/tree/fairness/supervised/fairness

Example script that compute fairness metrics: https://github.com/mljar/mljar-supervised/blob/fairness/examples/scripts/binary_classifier_adult_fairness.py

The API:

automl = AutoML(algorithms=["Xgboost"])
automl.fit(X_train, y_train, sensitive_features=sensitive_features_train)

TODO:

  1. The preprocessing for sensitive features should be improved. We need to remove rows with missing sensitive feature value and we should remove sensitive feature rows when target is missing.
  2. Only split validation supports sensitive features right now. It should be extended to all supported validation strategies.
  3. Better handle situations when senstive features and sample weights are provided.
  4. Handle continuous sensitive features.

Example report with information about fairness:

fairness-metrics

pplonski commented 1 year ago

Improvements:

Questions:

Demo: Peek 2023-05-01 13-34

pplonski commented 1 year ago

I think there should be at least 20 samples of same group to be considered in fairness mitigation. For example, we have a group defined by "Female", "Young<30", "Black", and there are only 5 samples for this group (0 samples with class 1). This group shouldn't be considered for computing fairness metrics, and shouldn't be considered for fairness mitigation.

pplonski commented 1 year ago

I've pushed the work in progress version for fairness mitigation. There are a lot of prints in the terminal - it is working version.

The algorithm is optimizing demographic parity ratio (it is hard coded). The output for mitigation for single feature (sex): single-feature-mitigation

The output for mitigation for two features (sex, is_young), is_young is categrocial feature craeted from age<50: two-features-mitigation

pplonski commented 1 year ago

TODO:

pplonski commented 1 year ago

If feature is not categorical it is automatically converted into binary feature based on equal samples number in each bin. We print the information in the terminal, example:

Sensitive features should be categorical
Apply automatic binarization for feature age
New values ['(37.0, 90.0]', '(16.999, 37.0]'] for feature age are applied
pplonski commented 1 year ago

The weights optimization stop condition is not yet implemented. This gives interesting behavior of algorithm. I was running the algorithm with privileged_groups and unprivileged_groups provided in API and DP Ratio goes above 1.0.

Please notice that below script is using two sensitive features sex and age. The privileged group is defined only for sex feature, and for this feature ratio is going above 1.0 (because there is no stop condition).

The age is passed as continuous features that is automatically converted into binary.

import pandas as pd
from sklearn.model_selection import train_test_split
from supervised.automl import AutoML
from sklearn.datasets import fetch_openml

data = fetch_openml(data_id=1590, as_frame=True)
y = (data.target == ">50K") * 1

X = data.data
y = (data.target == ">50K") * 1

sensitive_features = X[["sex", "age"]] 

X_train, X_test, y_train, y_test, S_train, S_test = train_test_split(
    X, y, sensitive_features, stratify=y, test_size=0.5, random_state=42
)

automl = AutoML(algorithms=["Xgboost"],
                train_ensemble=False,
                fairness_metric="demographic_parity_ratio",  # 
                fairness_threshold=0.8,
                privileged_groups = [{"sex": "Male"}],
                unprivileged_groups = [{"sex": "Female"}]
            )

automl.fit(X_train, y_train, sensitive_features=S_train)

Output: fairness-above-1 0

pplonski commented 1 year ago

I've run the AutoML with several algorithms on Adults dataset with two sensitive features. Below is an output from example script: Peek 2023-05-05 19-38

pplonski commented 1 year ago

Preview of Fair Ensemble

Peek 2023-05-17 13-09

pplonski commented 1 year ago

Issues:

mosaikme commented 9 months ago

Can we disable the Fairnes Metric?

pplonski commented 9 months ago

Hi @mosaikme,

fairness is only used when you pass sensitive_features in fit() otherwise it is skipped.