Open haytham918 opened 5 months ago
Could you provide a minimal example with toy data and the version of the different model.
This is highly possible that we need to modify our Pipeline
implementation to be compatible with the metadata routing from scikit-learn.
import pandas as pd
from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.over_sampling import SMOTE
from fairlearn.adversarial import AdversarialFairnessClassifier
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.model_selection import GridSearchCV
import sklearn
sklearn.set_config(enable_metadata_routing=True)
data = {
'race': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1],
'indicator': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1]
}
X = pd.DataFrame(data)
Y = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0]
# Sensitive Featrues
Z = X['race']
mitigator = AdversarialFairnessClassifier(
backend="torch",
predictor_model=[50, "relu"],
adversary_model=[3, "relu"],
batch_size=2**8,
progress_updates=0.5,
random_state=123,
).set_fit_request(sensitive_features=True)
pipe = ImbPipeline([
("scaling", Normalizer()), ("sampling", SMOTE()), ("classifier", mitigator)])
param_grid = {
"classifier__batch_size": [2**6]
}
grid_s = GridSearchCV(pipe, param_grid, cv=5, scoring="accuracy")
grid_s.fit(X, Y, sensitive_features=Z)
Here is a piece of code that demonstrates the issue. But I also think the fairlearn
's stuff has some incompatibility issue at this moment too
I am currently trying to incorporate
imblearn
's sampling methods such asSMOTE()
andNearMiss()
withThresholdOptimizer
andAdversarialFairnessClassifier
fromfairlearn
. When I try to put all of them to run inimblearn.pipeline
(sampling then classifier), the sampling step fails, which I guess it does not know what to do with the sensitive features we passed as metadata. Right now, I am twisting the work-flow to work this around, but I would like to know if there is a configuration or a feature that can easily solve this.