tensorflow / model-remediation

Model Remediation is a library that provides solutions for machine learning practitioners working to create and train models in a way that reduces or eliminates user harm resulting from underlying performance biases.
https://www.tensorflow.org/responsible_ai/model_remediation?hl=en
Apache License 2.0
43 stars 20 forks source link

[GSOC]: Create a simple colab for mindiff with CelebA data #26

Closed bhaktipriya closed 2 years ago

bhaktipriya commented 2 years ago

We have a colab for mindiff with text data here https://colab.sandbox.google.com/github/tensorflow/model-remediation/blob/master/docs/min_diff/tutorials/min_diff_keras.ipynb

This example predicts the toxicity of dataset https://blog.tensorflow.org/2020/11/applying-mindiff-to-improve-model.html

We want to add a new colab for mindiff with image data/

Use the Celeb A dataset which contains over 200,000 images of celebrities with 40 binary attribute annotations. The dataset is split in into train, validation, and test sets by its creators.

https://www.tensorflow.org/datasets/catalog/celeb_a

The images are annotated with 40 attributes that reflect appearance (hair color and style, face shape, makeup, for example), emotional state (smiling), gender, attractiveness, and age.

For this dataset, we use gender(male/female) as a binary protected characteristic, and attractiveness as the predicted outcome as the proxy measure of getting invited for a job interview in the world of fame :)

Train a vanilla model that takes an image and predicts the "Attractive" attribute.

Use fairness indicators to compare performance of model on male and female slices. Compare rates such as TPR, FPR.

Identify majority minority groups here.

Now retrain model with mindiff and reevaluate the model on male and female slices.

bhaktipriya commented 2 years ago

Share your colab here for reviews.

varuniyer commented 2 years ago

For this task, would it be okay to train a small model like MobileNetV2 to minimize training time on the CelebA dataset?

bhaktipriya commented 2 years ago

Yes. Any baseline model for image classification works! Thanks, Varun!

PrinceP commented 2 years ago

Hi @bhaktipriya, I wanted some clarifications about the task

What does "binary protected characteristic" means for slice information? Do we need to consider male and female data separately?

I am considering that Attractiveness can be effected based on a person's age(Young or not) and emotional state(Smiling or not). Is male/female needed to be taken below ?

Screenshot 2022-03-12 at 2 00 41 PM

YASH-GU24 commented 2 years ago

Hello @bhaktipriya as asked i have implement the given task. Please check out my notebook here :- https://colab.research.google.com/drive/14dhbf66TTLeKrP2CUbEuzh3wbd00KBIM?usp=sharing

Thanks a lot!, really learned a lot and enjoyed doing this

bhaktipriya commented 2 years ago

@PrinceP great questions! I'd encourage you to work on the colab too.

On slices: Compare how many males are marked as attractive and how many females are marked as attractive. Report these in the colab. See if there can be a better formulation for the problem.

I am considering that Attractiveness can be effected based on a person's age(Young or not) and emotional state(Smiling or not). Is male/female needed to be taken below ?

100% agree. Analyse how the labels are distributed across these slices. Attractive/Young, Attractive/Old. Attractive/Young/Male, Attractive/Young/Female etc. You can also use Know your data tool to get these numbers. Identify which labels are highly correlated using PMI. Once you identify the slices that have bias apply mindiff. Do mention how you went about the selection of these slices in your colab.

https://knowyourdata-tfds.withgoogle.com/#dataset=celeb_a&tab=RELATIONS&relations=kyd%2Fceleb_a%2Fattributes_Attractive,kyd%2Fceleb_a%2Fattributes_Male&relations_baseline_column=0&expanded_groups=celeb_a

bhaktipriya commented 2 years ago

@varuniyer yes you can use simple model too. You can also work on a different dataset of your choice(UCI adult income dataset-gender, German credit dataset-age) and add your colab here. I have also posted another task with TFCO here. https://github.com/tensorflow/model-remediation/issues/27

bhaktipriya commented 2 years ago

Compas dataset is also something you can work with https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Lineage_Case_Study. Focus on the fairness metrics for African-American and Caucasian defendants

bhaktipriya commented 2 years ago

@YASH-GU24 excellent colab and great execution! Please see if you can tune the mindiff model for better performance by. adjusting kernel size or mindiff weights. Thanks a lot for working on it!

YASH-GU24 commented 2 years ago

@YASH-GU24 excellent colab and great execution! Please see if you can tune the mindiff model for better performance by. adjusting kernel size or mindiff weights. Thanks a lot for working on it!

Thanks for the feedback @bhaktipriya , As far as MinDiff model is considered i am quite sure its accuracy will improve if we increase the number of epoch from 1 to greater number but i didn't do it because training time was quite large and our basic aim was to apply MinDiff and compare results. But surely i will try on improving the accuracy and let you know the results

ronakkkk commented 2 years ago

Hello @bhaktipriya,

The mentioned task for the CelebA dataset was mostly mentioned in the given link and by @YASH-GU24 (https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study) so I tried to do it manually to get a better insight.

I have done the training of the model using the original model along with MinDiff, would you mind having a look at the code, to confirm I have been on the correct path.

Even, would you mind sharing some key inspections of working of MinDiff? Like does it just make the slices of data equal or something else too?

One more thing, are we going to solve feedback parity issues to improve the biased model? Because I faced many difficulties when implementing the model at the industry level due to the number of users varies based on the result.

Please download the dataset to run the given colab as I have implemented it in the pycharm.

Program Code: https://colab.research.google.com/drive/1hbECDaZmKn8A8aUsRZngmwHczR8xgiby?hl=en#scrollTo=c2zGiJGf0Xas

vishesh-soni commented 2 years ago

Hello @bhaktipriya I tried to work on the task. Please look into the notebook and give feedback. https://colab.research.google.com/drive/15ldpOLeyxjWHPfg3DQ3rrtbe1zJ5gbbR?usp=sharing

bhaktipriya commented 2 years ago

@ronakkkk, @vishesh-soni and future contributors, I highly recommend doing this analysis on datasets other than celebA https://github.com/tensorflow/model-remediation/issues/26#issuecomment-1067527433 since @YASH-GU24 already has a colab for celebA.

If you do wish to proceed with CelebA, provide an analysis on which slices(other than male/female, say young/old) are underperforming and may be subject to bias. See https://github.com/tensorflow/model-remediation/issues/26#issuecomment-1067396897. You can experiment with mindiff on those slices.

Also try tuning mindiff to get a better performance. See comment here https://github.com/tensorflow/model-remediation/issues/26#issuecomment-1067550604

ronakkkk commented 2 years ago

@bhaktipriya Okay, would change the code accordingly. Thank You.

ronakkkk commented 2 years ago

@bhaktipriya I have tried to tune the MinDiff model using the Alexnet model and MobileNetV2 model. Even try to augment data into the training dataset but the Alexnet model performs best without augmentation, so sharing the google colab for the same.

Link: https://colab.research.google.com/drive/1YGnRLqT2DvncfwppAQ17fli_OtyWqXkh?usp=sharing

bhaktipriya commented 2 years ago

@bhaktipriya I have tried to tune the MinDiff model using the Alexnet model and MobileNetV2 model. Even try to augment data into the training dataset but the Alexnet model performs best without augmentation, so sharing the google colab for the same.

Link: https://colab.research.google.com/drive/1YGnRLqT2DvncfwppAQ17fli_OtyWqXkh?usp=sharing

This is great @ronakkkk. Improvements both in FPR and FNR. The diff between the slices has definitely reduced. Well documented and great execution.

What was the augmentation that you performed? I think I'm missing that.

Your data prep section's documentation is not doing what it says; ''" A split for Attractive examples referencing to males, This can be easily done by using the filter() method to filter out all the examples which are not male or not attractive. We will name this Dataset as "dataset_train_sensitive" A split for Attractive examples referencing to Females.This also can be easily done by using the filter() method to filter out all the examples which are male or not attractive.We will name this Dataset as "dataset_train_nonsensitive" """ You're explicitly filtering for attractive samples :)

Also, please send your proposals to us. Emails and instrructions are in the contributor document.

bhaktipriya commented 2 years ago

https://colab.research.google.com/drive/15ldpOLeyxjWHPfg3DQ3rrtbe1zJ5gbbR?usp=sharing

Thanks for sharing. Very well documented and experimented. Optional: I would also try runing mindiff on slices for male and female where label is non-attractive.

Thanks a lot for the work @vishesh-soni. Also, please send your proposals to us. Emails and instructions are in the contributor document.

bhaktipriya commented 2 years ago

All: Please send your proposals to us. Emails and instructions are in the contributor document.

ronakkkk commented 2 years ago

@bhaktipriya I have done the augmentation using

data_augmentation = tf.keras.Sequential([
  tf.keras.layers.RandomFlip("horizontal_and_vertical"),
  tf.keras.layers.RandomRotation(0.2),
])

And add it to the Alexnet model. But, it doesn't improve the accuracy of the MinDiff model thus haven't included it in the code.

For further improvement, I try with the VGG16 model, but the accuracy of the model reaches up to 0.55. So haven't moved further with the given model.

VGG16 Model:

# VGG 16 model
def vgg16_model():
    vgg16 = tf.keras.models.Sequential()

    # 1st Convolution Layer
    vgg16.add(Conv2D(filters=64, input_shape=(128, 128, 3), kernel_size=(3, 3), padding='same', name='image'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))

    # 2nd Convolutional Layer
    vgg16.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))
    vgg16.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))

    # 3rd Convolutional Layer
    vgg16.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))

    # 4th Convolutional Layer
    vgg16.add(Conv2D(filters=128, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))
    vgg16.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))

    # 5th Convolutional Layer
    vgg16.add(Conv2D(filters=512, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))

    # 6th Convolutional Layer
    vgg16.add(Conv2D(filters=512, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))

    # 7th Convolutional Layer
    vgg16.add(Conv2D(filters=512, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))
    vgg16.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))

    # 8th Convolutional Layer
    vgg16.add(Conv2D(filters=1024, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))

    # 9th Convolutional Layer
    vgg16.add(Conv2D(filters=1024, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))

    # 10th Convolutional Layer
    vgg16.add(Conv2D(filters=1024, kernel_size=(3, 3), padding='same'))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))
    vgg16.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))

    # Passing it to a Fully Connected layer
    vgg16.add(Flatten())

    # 1st Fully Connected Layer
    vgg16.add(Dense(units=4096, input_shape=(128, 128, 3)))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))
    # Add Dropout to prevent overfitting
    vgg16.add(Dropout(0.5))

    # 2nd Fully Connected Layer
    vgg16.add(Dense(units=4096))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('relu'))
    # Add Dropout to prevent overfitting
    vgg16.add(Dropout(0.5))

    # Output Layer having 2 output classes
    vgg16.add(Dense(units=1))
    vgg16.add(BatchNormalization())
    vgg16.add(Activation('softmax'))

    return vgg16

Oh, I will check and make the changes accordingly for the prepared data.

For further improvements, I am thinking to continue with the Alexnet model but will try to augment data using OpenCV. Or should I prefer any other techniques?

I'll send the proposals asap. Thank you for the feedback.