Open pranjal-awasthi opened 2 years ago
Hello,
This is my colab notebook for the Starter Task 1.
https://github.com/Prakruthi12345/Starter/blob/main/StarterTask1.ipynb
Hello @pranjal-awasthi , I have completed Task 1 and Task 2 . Kindly share your feedback . Here is my colab notebook
https://colab.research.google.com/drive/1Pzs2SXbI14QSh_aVo_7aG4Pz2UyorKgP?usp=sharing
Thanks for creating the tasks. It was quite an experience going through fairness indicator module.
[@pranjal-awasthi Could you please confirm if training set of the same size
means computing training set having equal number of samples based on sex and not necessarily same size as actual training set. Also, It would be extremely helpful if you could elaborate on uniform sampling with replacement or any reference I can refer to. I found multiple sources on random sampling from two datasets but not about uniform sampling.
Update I have done task 3 as well based on my initial understanding but I look forward to improve the results. Here is the notebook. https://colab.research.google.com/drive/1--bZmeyiu681GwQ1tJ8uHd9stKOax6-l?usp=sharing
Starter Task 3: Partition the training set into two subsets based on sex. Compute a new training set of the same size with half of the points coming from each set (implement uniform sampling with replacement). Re-train the model on the new training set and recompute the metrics and performance gap. Report your observations. Suggest one or two ways to improve the sampling of the training set that may result in a reduced performance gap.
Thanks a lot for working on the tasks! @bringingjoy, who is the primary owner for this task and I will take a look.
Note to future applicants: Please use a different public dataset of your choice instead of the UCI adult dataset to work on this task.
@bhaktipriya When we use the other dataset we need to keep it as a Binary classification problem only right?
I tried to work on task1 and task3 on UCL breast-cancer dataset. https://colab.research.google.com/drive/1hE194UNVm26plZD-hXrypN5T_XJTzs6q?usp=sharing
Hello,
This is my colab notebook for the Starter Task 1.
https://github.com/Prakruthi12345/Starter/blob/main/StarterTask1.ipynb
Thanks a lot for working on this! Your colab looks good to me. I would add some text to indicate what's being set up at each step. I would advise you to complete 2 and 3 also. Finally I would also add a section on conclusions that explains the effects of reweighting.
Please send your proposals to us. Emails are in the contributor document.
[@pranjal-awasthi Could you please confirm if
training set of the same size
means computing training set having equal number of samples based on sex and not necessarily same size as actual training set. Also, It would be extremely helpful if you could elaborate on uniform sampling with replacement or any reference I can refer to. I found multiple sources on random sampling from two datasets but not about uniform sampling.
Same size means number of examples. You are comparing two datasets of the same size, say 100 examples, one original and the other where half of the points coming from each set (implement uniform sampling with replacement).
Update I have done task 3 as well based on my initial understanding but I look forward to improve the results. Here is the notebook. https://colab.research.google.com/drive/1--bZmeyiu681GwQ1tJ8uHd9stKOax6-l?usp=sharing
Starter Task 3: Partition the training set into two subsets based on sex. Compute a new training set of the same size with half of the points coming from each set (implement uniform sampling with replacement). Re-train the model on the new training set and recompute the metrics and performance gap. Report your observations. Suggest one or two ways to improve the sampling of the training set that may result in a reduced performance gap. Your colab is great. Good explanation with results. Your implementation seems fine.
Thanks for working on it. Please send your proposals to us. Emails are in the contributor document.
@bhaktipriya When we use the other dataset we need to keep it as a Binary classification problem only right?
Yes.
Hi @bhaktipriya Can we still make submissions to the 3 starter tasks since I had been working on it and would require 2-3 days more to have the final output for all the three tasks
I tried to work on task1 and task3 on UCL breast-cancer dataset. https://colab.research.google.com/drive/1hE194UNVm26plZD-hXrypN5T_XJTzs6q?usp=sharing
This is super interesting! Thanks for working on this. Please quantify what bias is in the problem. Are you suggesting that left side experiences more bias than the right. Please use fairness indicators to measure the rates.
I do see that accuracy went up through resampling. That's great. You might want to add a few details about how imbalanced the initial dataset was and how balanced it was post sampling.
Also please send your proposals to us. Emails are in the contributor document.
Hi @bhaktipriya Can we still make submissions to the 3 starter tasks since I had been working on it and would require 2-3 days more to have the final output for all the three tasks
Yes.
Hi all, my apologies for the delay in providing you feedback. I am working through each of your starter tasks in the order you submitted them. (This is @bringingjoy, btw - I'm unfortunately locked out of my account due to 2-auth and changed phone number.)
Hello,
This is my colab notebook for the Starter Task 1.
https://github.com/Prakruthi12345/Starter/blob/main/StarterTask1.ipynb
Hi @Prakruthi12345, thank you for your submission for starter task 1! As @bhaktipriya suggested, it would be useful to work on tasks 2 and 3 and provide some discussion/conclusion.
Here is detailed feedback on the Colab you submitted. I hope this feedback helps you identify areas to focus on to improve as an open-source ready programmer!
A general comment/feedback to everyone submitting starter tasks:
Thank you all for your submissions! I will continue going through and providing feedback today. If you'd like to work on this project, please start putting together your proposals and note down any specific questions you may have. (Reminder that emails are in the contributor document.)
I have completed Task 1 and Task 2 . Kindly share your feedback . Here is my colab notebook
https://colab.research.google.com/drive/1Pzs2SXbI14QSh_aVo_7aG4Pz2UyorKgP?usp=sharing
Hi @schopra6 , thank you for your submission for tasks 1 and 2! Overalls, looks good! Here is my feedback (things you did well and room for improvement).
Hello @bhaktipriya @jennyhamer @pranjal-awasthi , I have attempted to solve the given 3 tasks.
Dataset : UCI credit approval dataset
here is the colab notebook link : https://colab.research.google.com/drive/14OmYm8tyUHTO8D8oR4SvayXgudG9KD0z?usp=sharing
please review the notebook and give suggestions
Thanks to @schopra6 for providing such good notebook. it helped me learn how to work with fairness Indicators and model analysis.
Thank you.
Project Description: Fair data reweighting is a simple and effective pre-processing to ensure model fairness. The goal of this project is to design and develop an open source implementation of fair data reweighting, demonstrate its effectiveness on publicly available datasets and perform a study of how the algorithm performance and efficiency scales with increasing the number of protected subgroups. Time permitting, the project will also involve exploring algorithmic techniques for improving the reweighting algorithm and/or making it faster.
Towards the above goal we encourage participants to first complete the following starter tasks.
Note: The following tasks are open for multiple submissions. Anyone interested in the project is encouraged so solve the tasks below and submit their work.
Starter Task 1: Train a one layer fully connected network in TensorFlow to solve the binary classification problem of income prediction for the [UCI Adult Dataset]. Compute the overall model accuracy on the test set and also compute the performance gap in the model accuracy for points with sex=Male vs. points with sex=Female.
Starter Task 2: Recompute the same metrics and performance gap as above via the TensorFlow Fairness Indicators library.
Starter Task 3: Partition the training set into two subsets based on sex. Compute a new training set of the same size with half of the points coming from each set (implement uniform sampling with replacement). Re-train the model on the new training set and recompute the metrics and performance gap. Report your observations. Suggest one or two ways to improve the sampling of the training set that may result in a reduced performance gap.