[GSOC]: Fair Data Reweighting

pranjal-awasthi commented 2 years ago

Project Description: Fair data reweighting is a simple and effective pre-processing to ensure model fairness. The goal of this project is to design and develop an open source implementation of fair data reweighting, demonstrate its effectiveness on publicly available datasets and perform a study of how the algorithm performance and efficiency scales with increasing the number of protected subgroups. Time permitting, the project will also involve exploring algorithmic techniques for improving the reweighting algorithm and/or making it faster.

Towards the above goal we encourage participants to first complete the following starter tasks.

Note: The following tasks are open for multiple submissions. Anyone interested in the project is encouraged so solve the tasks below and submit their work.

Starter Task 1: Train a one layer fully connected network in TensorFlow to solve the binary classification problem of income prediction for the [UCI Adult Dataset]. Compute the overall model accuracy on the test set and also compute the performance gap in the model accuracy for points with sex=Male vs. points with sex=Female.

Starter Task 2: Recompute the same metrics and performance gap as above via the TensorFlow Fairness Indicators library.

Starter Task 3: Partition the training set into two subsets based on sex. Compute a new training set of the same size with half of the points coming from each set (implement uniform sampling with replacement). Re-train the model on the new training set and recompute the metrics and performance gap. Report your observations. Suggest one or two ways to improve the sampling of the training set that may result in a reduced performance gap.

Prakruthi12345 commented 2 years ago

Hello,

This is my colab notebook for the Starter Task 1.

https://github.com/Prakruthi12345/Starter/blob/main/StarterTask1.ipynb

schopra6 commented 2 years ago

Hello @pranjal-awasthi , I have completed Task 1 and Task 2 . Kindly share your feedback . Here is my colab notebook

https://colab.research.google.com/drive/1Pzs2SXbI14QSh_aVo_7aG4Pz2UyorKgP?usp=sharing

Thanks for creating the tasks. It was quite an experience going through fairness indicator module.

schopra6 commented 2 years ago

[@pranjal-awasthi Could you please confirm if training set of the same size means computing training set having equal number of samples based on sex and not necessarily same size as actual training set. Also, It would be extremely helpful if you could elaborate on uniform sampling with replacement or any reference I can refer to. I found multiple sources on random sampling from two datasets but not about uniform sampling.

Update I have done task 3 as well based on my initial understanding but I look forward to improve the results. Here is the notebook. https://colab.research.google.com/drive/1--bZmeyiu681GwQ1tJ8uHd9stKOax6-l?usp=sharing

Starter Task 3: Partition the training set into two subsets based on sex. Compute a new training set of the same size with half of the points coming from each set (implement uniform sampling with replacement). Re-train the model on the new training set and recompute the metrics and performance gap. Report your observations. Suggest one or two ways to improve the sampling of the training set that may result in a reduced performance gap.

bhaktipriya commented 2 years ago

Thanks a lot for working on the tasks! @bringingjoy, who is the primary owner for this task and I will take a look.

Note to future applicants: Please use a different public dataset of your choice instead of the UCI adult dataset to work on this task.

guneetsk99 commented 2 years ago

@bhaktipriya When we use the other dataset we need to keep it as a Binary classification problem only right?

vishesh-soni commented 2 years ago

I tried to work on task1 and task3 on UCL breast-cancer dataset. https://colab.research.google.com/drive/1hE194UNVm26plZD-hXrypN5T_XJTzs6q?usp=sharing

bhaktipriya commented 2 years ago

Hello,

This is my colab notebook for the Starter Task 1.

https://github.com/Prakruthi12345/Starter/blob/main/StarterTask1.ipynb

Thanks a lot for working on this! Your colab looks good to me. I would add some text to indicate what's being set up at each step. I would advise you to complete 2 and 3 also. Finally I would also add a section on conclusions that explains the effects of reweighting.

Please send your proposals to us. Emails are in the contributor document.

bhaktipriya commented 2 years ago

[@pranjal-awasthi Could you please confirm if training set of the same size means computing training set having equal number of samples based on sex and not necessarily same size as actual training set. Also, It would be extremely helpful if you could elaborate on uniform sampling with replacement or any reference I can refer to. I found multiple sources on random sampling from two datasets but not about uniform sampling.

Same size means number of examples. You are comparing two datasets of the same size, say 100 examples, one original and the other where half of the points coming from each set (implement uniform sampling with replacement).

Update I have done task 3 as well based on my initial understanding but I look forward to improve the results. Here is the notebook. https://colab.research.google.com/drive/1--bZmeyiu681GwQ1tJ8uHd9stKOax6-l?usp=sharing

Starter Task 3: Partition the training set into two subsets based on sex. Compute a new training set of the same size with half of the points coming from each set (implement uniform sampling with replacement). Re-train the model on the new training set and recompute the metrics and performance gap. Report your observations. Suggest one or two ways to improve the sampling of the training set that may result in a reduced performance gap. Your colab is great. Good explanation with results. Your implementation seems fine.

Thanks for working on it. Please send your proposals to us. Emails are in the contributor document.

bhaktipriya commented 2 years ago

@bhaktipriya When we use the other dataset we need to keep it as a Binary classification problem only right?

Yes.

guneetsk99 commented 2 years ago

Hi @bhaktipriya Can we still make submissions to the 3 starter tasks since I had been working on it and would require 2-3 days more to have the final output for all the three tasks

bhaktipriya commented 2 years ago

I tried to work on task1 and task3 on UCL breast-cancer dataset. https://colab.research.google.com/drive/1hE194UNVm26plZD-hXrypN5T_XJTzs6q?usp=sharing

This is super interesting! Thanks for working on this. Please quantify what bias is in the problem. Are you suggesting that left side experiences more bias than the right. Please use fairness indicators to measure the rates.

I do see that accuracy went up through resampling. That's great. You might want to add a few details about how imbalanced the initial dataset was and how balanced it was post sampling.

Also please send your proposals to us. Emails are in the contributor document.

bhaktipriya commented 2 years ago

Hi @bhaktipriya Can we still make submissions to the 3 starter tasks since I had been working on it and would require 2-3 days more to have the final output for all the three tasks

Yes.

jennyhamer commented 2 years ago

Hi all, my apologies for the delay in providing you feedback. I am working through each of your starter tasks in the order you submitted them. (This is @bringingjoy, btw - I'm unfortunately locked out of my account due to 2-auth and changed phone number.)

jennyhamer commented 2 years ago

Hello,

This is my colab notebook for the Starter Task 1.

https://github.com/Prakruthi12345/Starter/blob/main/StarterTask1.ipynb

Hi @Prakruthi12345, thank you for your submission for starter task 1! As @bhaktipriya suggested, it would be useful to work on tasks 2 and 3 and provide some discussion/conclusion.

Here is detailed feedback on the Colab you submitted. I hope this feedback helps you identify areas to focus on to improve as an open-source ready programmer!

Lots of code duplication: next time, move reused code into functions; move all imports to top of Colab and don’t repeat imports
Better to modularize the training and testing setup: forward pass can be reused by the test module
- E.g. define activation function in a standalone Python function. (this can then be passed into the training function and be swapped out with other activation functions without changing the training logic itself)
Please start to reference the Google Python Style Guide: it is an excellent resource for variable naming conventions, syntax, and many other Python best practices. https://google.github.io/styleguide/pyguide.html
We suggested using the TensorFlow to implement the one-layer connected network, since TensorFlow will be the library we use for our ML models.
Print out results and use Python built-in rounding instead of copy-pasting into a Text box to reduce chances of introducing typos/errors.

jennyhamer commented 2 years ago

A general comment/feedback to everyone submitting starter tasks:

When creating a publicly available Colabs, consider adding an "author" field (i.e. with your GitHub handle). If anybody comes across your work, has questions, comments, or corrections, they know who produced it and who to contact (or who to cite if they use your work).
Similarly, if you use publicly available tutorials or any existing code either as a guide or by directly copying existing code, give credit to the authors and include a link(s). In other words, cite your sources. This is useful, because:
- Clarifies potential issues of plagiarism.
- If there are any errors in the code introduced by someone other than you, you can traceback the source or more easily reference any updates/errata.
- Increases visibility of helpful educational materials!

Thank you all for your submissions! I will continue going through and providing feedback today. If you'd like to work on this project, please start putting together your proposals and note down any specific questions you may have. (Reminder that emails are in the contributor document.)

jennyhamer commented 2 years ago

I have completed Task 1 and Task 2 . Kindly share your feedback . Here is my colab notebook

https://colab.research.google.com/drive/1Pzs2SXbI14QSh_aVo_7aG4Pz2UyorKgP?usp=sharing

Hi @schopra6 , thank you for your submission for tasks 1 and 2! Overalls, looks good! Here is my feedback (things you did well and room for improvement).

Good use of structuring your Colab for better readability and navigation. This type of practice with section descriptions (brief or more detailed depending on complexity) is useful for users when producing open-sourced Colab demos.
Good use of commenting: descriptive where necessary and not too cluttered.
- General best practice: opt for descriptive variable naming and coding practices to reduce need for comments (comments can easily go stale)
Nit: Unless there are cells you may only run under certain circumstances, it’s a good idea to put all your imports at the top of the Colab. If there are any issues with loading the imports, you’ll catch them at the start and not once you’ve done a bunch of work.
- Especially important if any may need to be installed in your current runtime. Once installed, it’s often necessary to restart the runtime, which will lose all the preceding computation. (i.e. move Fairness Indicators-related imports to a nested section under the initial “Setup” section, but leave the actual setup code where it is.)
Generally descriptive variable naming: consider using a different name for something like “train_inpf” or “test_inpf” (looks like train_inpf means train_function?).
Fairness Indicators thresholds:
- State reason for selection of thresholds. This wasn’t part of the starter task, so including rationale would be great to understand your motivation and what you were trying to achieve/explore. :]
- Anytime you use “magics” or hardcoded values, it’s important to describe them. In some cases, these should be made constants (check the Python Style Guide).
- You can also use np.linspace() to create an array of evenly spaced values.
Spend a little more time fleshing out the Comments/Discussion section. You selected a variety of thresholds to evaluate at, but I don’t see results for these. Also, the Fairness Indicators tool was designed to give you insight and information about the dataset and performance on different slices. Do you notice anything about this dataset in the context of the starter task specifications?

kishoreKunisetty commented 2 years ago

Hello @bhaktipriya @jennyhamer @pranjal-awasthi , I have attempted to solve the given 3 tasks.

Dataset : UCI credit approval dataset

here is the colab notebook link : https://colab.research.google.com/drive/14OmYm8tyUHTO8D8oR4SvayXgudG9KD0z?usp=sharing

please review the notebook and give suggestions

Thanks to @schopra6 for providing such good notebook. it helped me learn how to work with fairness Indicators and model analysis.

Thank you.

CLSchmitz commented 2 years ago

Hi there, my solution for this starter task is here.

tensorflow / model-remediation

[GSOC]: Fair Data Reweighting #28