koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.28k stars 118 forks source link

make FairClassifier data-agnostic #669

Closed DeaMariaLeon closed 6 months ago

DeaMariaLeon commented 6 months ago

Before working on a large PR, please check with @FBruzzesi or @koaning to confirm that they agree with the direction of the PR. This discussion should take place in a Github issue before working on the PR, unless it's a minor change like spelling in the docs.

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.

Trying to make FairClassifier data-agnostic. Issue https://github.com/koaning/scikit-lego/issues/658 The issue is equal_opportunity_score is used that to test it. But this is a public function, so Narwhals can't be added there yet. I did try but a lot of changes needed to be done (I think).

That's why I added this to tests/test_estimators/test_equal_opportunity.py:

ln 121  if isinstance(X, pl.DataFrame):
           X = pd.DataFrame(X.to_dict())
           y = pd.Series(y.to_list())

Type of change

Checklist:

If you feel your PR is ready for a review, ping @FBruzzesi or @koaning. @MarcoGorelli

MarcoGorelli commented 6 months ago

hey

But this is a public function, so Narwhals can't be added there yet.

Not sure I understand, could you clarify what you mean please?

Does it not work to do

--- a/sklego/metrics.py
+++ b/sklego/metrics.py
@@ -152,7 +152,7 @@ def equal_opportunity_score(sensitive_column, positive_target=1):
         """Remember: X is the thing going *in* to your pipeline."""
         sensitive_col = X[:, sensitive_column] if isinstance(X, np.ndarray) else X[sensitive_column]

-        if not np.all((sensitive_col == 0) | (sensitive_col == 1)):
+        if not ((sensitive_col == 0) | (sensitive_col == 1)).all():

?

DeaMariaLeon commented 6 months ago

Ciao and thanks to both of you!

I meant that equal_opportunity_score is part of the API, and can be used with other classifiers. I thought that I shouldn't change that part if others are & can use this function. Specially because I had earlier tried Marco's solution, but also something dumb and was breaking metrics. 🙈

Long-story short, it seems to work.. thanks for the quick feedback.

DeaMariaLeon commented 6 months ago

sensitive_classification_dataset is also used here edit: (line 68...): https://github.com/koaning/scikit-lego/blob/main/tests/test_metrics/test_equal_opportunity.py on a LogisticRegression.. Should I really remove it?

That breaks the tests bellow. To avoid that, I would have to convert from polars to pandas before "feeding" the data to LogisticRegression.. am I missing something?

ERROR tests/test_metrics/test_equal_opportunity.py::test_p_percent_numpy
ERROR tests/test_metrics/test_equal_opportunity.py::test_warning_is_logged
ERROR tests/test_metrics/test_p_percent.py::test_p_percent_pandas
ERROR tests/test_metrics/test_p_percent.py::test_p_percent_numpy
ERROR tests/test_metrics/test_p_percent.py::test_warning_is_logged

@FBruzzesi

DeaMariaLeon commented 6 months ago

Just to clarify, I did all the requested changes - except:

Could you please remove the sensitive_classification_dataset fixture

That way the errors I showed earlier are gone. (Until new instructions 🤓)