Closed bhaktipriya closed 2 years ago
Share your colab here for reviews.
For this task, would it be okay to train a small model like MobileNetV2 to minimize training time on the CelebA dataset?
Yes. Any baseline model for image classification works! Thanks, Varun!
Hi @bhaktipriya, I wanted some clarifications about the task
What does "binary protected characteristic" means for slice information? Do we need to consider male and female data separately?
I am considering that Attractiveness can be effected based on a person's age(Young or not) and emotional state(Smiling or not). Is male/female needed to be taken below ?
Hello @bhaktipriya as asked i have implement the given task. Please check out my notebook here :- https://colab.research.google.com/drive/14dhbf66TTLeKrP2CUbEuzh3wbd00KBIM?usp=sharing
Thanks a lot!, really learned a lot and enjoyed doing this
@PrinceP great questions! I'd encourage you to work on the colab too.
On slices: Compare how many males are marked as attractive and how many females are marked as attractive. Report these in the colab. See if there can be a better formulation for the problem.
I am considering that Attractiveness can be effected based on a person's age(Young or not) and emotional state(Smiling or not). Is male/female needed to be taken below ?
100% agree. Analyse how the labels are distributed across these slices. Attractive/Young, Attractive/Old. Attractive/Young/Male, Attractive/Young/Female etc. You can also use Know your data tool to get these numbers. Identify which labels are highly correlated using PMI. Once you identify the slices that have bias apply mindiff. Do mention how you went about the selection of these slices in your colab.
@varuniyer yes you can use simple model too. You can also work on a different dataset of your choice(UCI adult income dataset-gender, German credit dataset-age) and add your colab here. I have also posted another task with TFCO here. https://github.com/tensorflow/model-remediation/issues/27
Compas dataset is also something you can work with https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Lineage_Case_Study. Focus on the fairness metrics for African-American and Caucasian defendants
@YASH-GU24 excellent colab and great execution! Please see if you can tune the mindiff model for better performance by. adjusting kernel size or mindiff weights. Thanks a lot for working on it!
@YASH-GU24 excellent colab and great execution! Please see if you can tune the mindiff model for better performance by. adjusting kernel size or mindiff weights. Thanks a lot for working on it!
Thanks for the feedback @bhaktipriya , As far as MinDiff model is considered i am quite sure its accuracy will improve if we increase the number of epoch from 1 to greater number but i didn't do it because training time was quite large and our basic aim was to apply MinDiff and compare results. But surely i will try on improving the accuracy and let you know the results
Hello @bhaktipriya,
The mentioned task for the CelebA dataset was mostly mentioned in the given link and by @YASH-GU24 (https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study) so I tried to do it manually to get a better insight.
I have done the training of the model using the original model along with MinDiff, would you mind having a look at the code, to confirm I have been on the correct path.
Even, would you mind sharing some key inspections of working of MinDiff? Like does it just make the slices of data equal or something else too?
One more thing, are we going to solve feedback parity issues to improve the biased model? Because I faced many difficulties when implementing the model at the industry level due to the number of users varies based on the result.
Please download the dataset to run the given colab as I have implemented it in the pycharm.
Program Code: https://colab.research.google.com/drive/1hbECDaZmKn8A8aUsRZngmwHczR8xgiby?hl=en#scrollTo=c2zGiJGf0Xas
Hello @bhaktipriya I tried to work on the task. Please look into the notebook and give feedback. https://colab.research.google.com/drive/15ldpOLeyxjWHPfg3DQ3rrtbe1zJ5gbbR?usp=sharing
@ronakkkk, @vishesh-soni and future contributors, I highly recommend doing this analysis on datasets other than celebA https://github.com/tensorflow/model-remediation/issues/26#issuecomment-1067527433 since @YASH-GU24 already has a colab for celebA.
If you do wish to proceed with CelebA, provide an analysis on which slices(other than male/female, say young/old) are underperforming and may be subject to bias. See https://github.com/tensorflow/model-remediation/issues/26#issuecomment-1067396897. You can experiment with mindiff on those slices.
Also try tuning mindiff to get a better performance. See comment here https://github.com/tensorflow/model-remediation/issues/26#issuecomment-1067550604
@bhaktipriya Okay, would change the code accordingly. Thank You.
@bhaktipriya I have tried to tune the MinDiff model using the Alexnet model and MobileNetV2 model. Even try to augment data into the training dataset but the Alexnet model performs best without augmentation, so sharing the google colab for the same.
Link: https://colab.research.google.com/drive/1YGnRLqT2DvncfwppAQ17fli_OtyWqXkh?usp=sharing
@bhaktipriya I have tried to tune the MinDiff model using the Alexnet model and MobileNetV2 model. Even try to augment data into the training dataset but the Alexnet model performs best without augmentation, so sharing the google colab for the same.
Link: https://colab.research.google.com/drive/1YGnRLqT2DvncfwppAQ17fli_OtyWqXkh?usp=sharing
This is great @ronakkkk. Improvements both in FPR and FNR. The diff between the slices has definitely reduced. Well documented and great execution.
What was the augmentation that you performed? I think I'm missing that.
Your data prep section's documentation is not doing what it says; ''" A split for Attractive examples referencing to males, This can be easily done by using the filter() method to filter out all the examples which are not male or not attractive. We will name this Dataset as "dataset_train_sensitive" A split for Attractive examples referencing to Females.This also can be easily done by using the filter() method to filter out all the examples which are male or not attractive.We will name this Dataset as "dataset_train_nonsensitive" """ You're explicitly filtering for attractive samples :)
Also, please send your proposals to us. Emails and instrructions are in the contributor document.
https://colab.research.google.com/drive/15ldpOLeyxjWHPfg3DQ3rrtbe1zJ5gbbR?usp=sharing
Thanks for sharing. Very well documented and experimented. Optional: I would also try runing mindiff on slices for male and female where label is non-attractive.
Thanks a lot for the work @vishesh-soni. Also, please send your proposals to us. Emails and instructions are in the contributor document.
All: Please send your proposals to us. Emails and instructions are in the contributor document.
@bhaktipriya I have done the augmentation using
data_augmentation = tf.keras.Sequential([
tf.keras.layers.RandomFlip("horizontal_and_vertical"),
tf.keras.layers.RandomRotation(0.2),
])
And add it to the Alexnet model. But, it doesn't improve the accuracy of the MinDiff model thus haven't included it in the code.
For further improvement, I try with the VGG16 model, but the accuracy of the model reaches up to 0.55. So haven't moved further with the given model.
VGG16 Model:
# VGG 16 model
def vgg16_model():
vgg16 = tf.keras.models.Sequential()
# 1st Convolution Layer
vgg16.add(Conv2D(filters=64, input_shape=(128, 128, 3), kernel_size=(3, 3), padding='same', name='image'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
# 2nd Convolutional Layer
vgg16.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
vgg16.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
# 3rd Convolutional Layer
vgg16.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
# 4th Convolutional Layer
vgg16.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
vgg16.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
# 5th Convolutional Layer
vgg16.add(Conv2D(filters=512, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
# 6th Convolutional Layer
vgg16.add(Conv2D(filters=512, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
# 7th Convolutional Layer
vgg16.add(Conv2D(filters=512, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
vgg16.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
# 8th Convolutional Layer
vgg16.add(Conv2D(filters=1024, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
# 9th Convolutional Layer
vgg16.add(Conv2D(filters=1024, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
# 10th Convolutional Layer
vgg16.add(Conv2D(filters=1024, kernel_size=(3, 3), padding='same'))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
vgg16.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
# Passing it to a Fully Connected layer
vgg16.add(Flatten())
# 1st Fully Connected Layer
vgg16.add(Dense(units=4096, input_shape=(128, 128, 3)))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
# Add Dropout to prevent overfitting
vgg16.add(Dropout(0.5))
# 2nd Fully Connected Layer
vgg16.add(Dense(units=4096))
vgg16.add(BatchNormalization())
vgg16.add(Activation('relu'))
# Add Dropout to prevent overfitting
vgg16.add(Dropout(0.5))
# Output Layer having 2 output classes
vgg16.add(Dense(units=1))
vgg16.add(BatchNormalization())
vgg16.add(Activation('softmax'))
return vgg16
Oh, I will check and make the changes accordingly for the prepared data.
For further improvements, I am thinking to continue with the Alexnet model but will try to augment data using OpenCV. Or should I prefer any other techniques?
I'll send the proposals asap. Thank you for the feedback.
We have a colab for mindiff with text data here https://colab.sandbox.google.com/github/tensorflow/model-remediation/blob/master/docs/min_diff/tutorials/min_diff_keras.ipynb
This example predicts the toxicity of dataset https://blog.tensorflow.org/2020/11/applying-mindiff-to-improve-model.html
We want to add a new colab for mindiff with image data/
Use the Celeb A dataset which contains over 200,000 images of celebrities with 40 binary attribute annotations. The dataset is split in into train, validation, and test sets by its creators.
https://www.tensorflow.org/datasets/catalog/celeb_a
The images are annotated with 40 attributes that reflect appearance (hair color and style, face shape, makeup, for example), emotional state (smiling), gender, attractiveness, and age.
For this dataset, we use gender(male/female) as a binary protected characteristic, and attractiveness as the predicted outcome as the proxy measure of getting invited for a job interview in the world of fame :)
Train a vanilla model that takes an image and predicts the "Attractive" attribute.
Use fairness indicators to compare performance of model on male and female slices. Compare rates such as TPR, FPR.
Identify majority minority groups here.
Now retrain model with mindiff and reevaluate the model on male and female slices.