add fairness report - Githubissues

nsorros commented 4 years ago

Hi, I am Nick. Work for WellcomeTrust during the day and volunteer with DataKind UK at night. In both places I am somehow involved with ethics and fairness. Will be happy to contribute in one way or another to this library.

At the Wellcome we have developed a relatively simple fairness report function similar to the classification report of sklearn https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html that computes instead of precision, recall, f1, true positive rate, false positive rate etc per group of interest.

The API atm is fairness_report(y_true, y_pred, groups, group_names) with groups indicating membership on a group of interest which can also be a matrix if examples can belong to multiple groups in which case group names gives the name of the columns. An additional average parameter can be added to give more control on how these cases are handled which at the moment defaults to micro. Let me know if this is an addition you are interested in.

koaning commented 4 years ago

This definately sounds interesting, and welcome! We've got some metrics from scikit-lego that we will import such as p_percent_score (thought I cannot promise a timeline, considering the pandemic).

It should also be said that at the moment, we're not affiliated with scikit-learn. The end goal is that we can consolidate many seperate fairness packages out there into a single standard for scikit learn and this will be declared as an official "scikit-contrib" package. There's still steps that need to be taken to formalise that and I don't know on what timeframe this might happen (again, pandemic).

koaning commented 4 years ago

@nsorros just wanted to let you know that we're still interested in this feature, are you working on it?

nsorros commented 4 years ago

got distracted with the while covid situation. will have a look at this, this week hopefully.

koaning commented 4 years ago

Just to make it clear; there is no rush and there's bound to be more important matters than this commit. I was just wondering. Let me know if you appreciate help. :)

nsorros commented 4 years ago

I lost my mind for a few minutes as I was looking for the repo and issue in scikit lego and could not find anything. I will work on this today and I am sure I will have some questions about how best to proceed soon. At that point I will probably need help / guidance :)

koaning commented 4 years ago

Awesome. Feel free to make a PR that is in-progress. I have some spare time today to have a peek.

And yeah ... there's an awkward transition from the legos project to this one. But we're nearing a release ... and it'd be grand to have this feature in there too. Pretty exited to get contributions from DataKind.

nsorros commented 4 years ago

Just started a draft PR to get some initial thoughts https://github.com/koaning/scikit-fairness/pull/25/files.

At the moment it only works for binary cases and not for groups that can be a matrix but we can generalise to both. The metrics we use are true positive rate, false positive rate, positive predictive value and false discovery rate. I wonder whether there should come more like scores like your equal opportunity score and whether we should use different terminology more used in fairness.

The code could also benefit for some warnings as you have for the fact that it works only for binary and some tests i guess.

Looking forward to hearing your thoughts

koaning commented 4 years ago

I already wrote my thoughts at the PR and I think you're in the right direction :) let's move the conversation there for now though.

nsorros commented 1 year ago

closing this (cleaning 🧹 issues today)

koaning / scikit-fairness

add fairness report #14