raghav-khanna / Facility-Location-India

MIT License
1 stars 0 forks source link

[Ziko et.al] Algorithm keeps producing fair results despite setting 'fairness' to False and lambda = 0.0 #13

Open DarkMenacer opened 8 months ago

DarkMenacer commented 8 months ago

Description

The algorithm, on line number 71 of test_fair_clustering.py, has a variable called 'fairness'. The code claims that setting this value to false would provide unfair clustering results. Furthermore, it also has a tradeoff controller lambda, which when set to 0 should produce unfair results.
However while testing intuitive datasets, the output still appears to be that of a fair algorithm as opposed to an unfair one.

To do

Understand how the variable 'fairness' is affecting the code and how to produce unfair results when it is set to zero (likewise for trade-off controller lambda)

Example

After setting 'fairness' to false, an intuitive dataset like

produces following output:

Screenshot 2023-10-29 at 00 01 00

Whereas it should produce:

Screenshot 2023-10-29 at 00 02 18
raghav-khanna commented 8 months ago

Some major observations for understanding the issue

For imitating an example like above, we created a dataset for our specific case -

id,name,state,10-14,15-19,latitude,longitude
1,Mumbai ,MAHARASHTRA ,0,1,0.333909,1.272195
2,Pune ,MAHARASHTRA ,0,1,0.322582,1.289061
3,Nashik ,MAHARASHTRA ,0,1,0.348765,1.287309
4,Ahmadnagar ,MAHARASHTRA ,0,1,0.411741,1.273569
5,Kolhapur ,MAHARASHTRA ,1,0,0.291182,1.295745
6,Solapur ,MAHARASHTRA ,1,0,0.308269,1.325013
7,Belgaum ,KARNATAKA ,1,0,0.276900,1.300422
8,Dharwad ,KARNATAKA ,1,0,0.269694,1.309014

The dataset has 2 protected groups - 10-14 age group and 15-19 age group. Clearly, the first 4 data points are in the second protected group and the last 4 data points are in the first protected group. Also, according to their latitude and longitude, they are quite similarly located as in the above comment.


First run with this dataset

Different colors represent different clusters and different shapes represent different protected groups. Cluster centers are represented by circles of that color.

scatter_plot

image

This image (of 'Making sense of the output') clearly shows that the output is not balanced! Surprising!

What could be the issue?

After trying things like normalizing the dataset before passing it to the algorithm, and changing the fairness variable in the code to 'False', the output was not changing. But, then there was a thought of changing the value of Lambda and make it a single value of instead of a range (lambda_tune)

Changing the value for lambda

Output for lambda = 1

scatter_plot

image

This is absolutely balanced output! (perfect representation of both the protected groups in both the clusters!) Seems like a victory! But why does equating lambda = 1 suddenly gives such a convincing result? After trying, lambda = 1,2,3,...,15,16 gave the same result. But for lambda >= 17, it started giving un-fair results. But why? Also, this time changing the fairness variable in the code to False did give a different result (un-fair result).


Important takeaways

Questions to answer