vitalwarley commented 11 months ago

Encontrei-o enquanto procurava código para #49.

vitalwarley commented 10 months ago

Estou atualmente na seção 4.2, que trata da função de perda. Um dos parágrafos ficou relativamente difícil de entender. Usei o app consensus dentro do ChatGPT para obter esclarecimentos. A resposta é bem fundamentada, onde há as citações dos trabalhos que fundamentaram os conceitos necessários ao meu entendimento.

Um trecho:

The concept of fairness-aware contrastive loss function in facial recognition, as described in your query, involves several technical aspects: larger gradients, similarity to margin penalty, balancing unfairness, and achieving consistent compactness across races.

Mais detalhes no link anterior.

vitalwarley commented 10 months ago

Abstract

Kinship Verification has several applications, but the task lacks large-scale kinship datasets to train models robust to biases in gender, age, ethnicity, etc. To solve this task, the authors propose a multi-task architecture with an attention module, a fairness-aware contrastive loss combined with a debias term and adversarial learning, and a large dataset by combining several existing kinship datasets.

Introduction

Researchers are too focused on accuracy performance that racial traits inherent in human faces aren't properly accounted for. This has a detrimental impact in AI systems‒healthcare, hiring, recidivism etc.

Previous works on fairness in face recognition and face verification

Reference 22, Fair contrastive learning for facial attribute classification, exploits the interrelation between anchor and sample to design a sensitive attribute removing loss function.
Reference 42, Consistent instance false positive improves fairness in face recognition, uses instance FPR in loss function to constrain bias.
Both 22 and 42 proposed a fairness-aware loss function.
Reference 39, Fairness-aware adversarial perturbation towards bias mitigation for deployed deep models, implemented post-processing data perturbation without changing their parameters and structures that can hide the information of protected attributes attributes.
Reference 11, Mitigating face recognition bias via group adaptive classifier, proposed to include demographic-adaptive layers that make the model generate face representations for every demographic group.
It is important to make the model focus on the most critical regions. Siamese nets and attention schemes are popular methods in kinship verification because they focus on similar facial traits.

Proposal

Objective

Improve racial fairness while achieving higher accuracy.

Problem 1: fairness and small datasets

They combine multiple kinship datasets and label every individual's race ‒ KinRace.

Problem 2: boost (kinship verification?) accuracy and fairness simultaneously

They propose a fairness-aware loss function in a multi-task learning framework.

Problem 3: improve kinship verification accuracy

They use attention module that makes the model focus on the most representative facial regions for feature representation learning.

Problem 4: fairness (in general?)

They reverse the gradient of the race classification branch to remove the racial information in the feature vector.
They design a fairness-aware contrastive loss function that can mitigate pairwise bias and significantly decrease the standard deviation in four races.

Schematic

In summary, a innovative model structure that utilizes two debias techniques: gradient reversal and fairness-aware loss function. All these methods are integrated and evaluated on a new dataset.

Contributions

The first work to propose to mitigate bias and achieve SOTA accuracy simultaneously for kinship verification.
A fairness-aware contrastive loss function that mitigates the pairwise bias and balances the degree of compactness of every race, which improves racial fairness.
A large kinship dataset with racial labels from several public kinship datasets.

Related Work

Kinship Verification

Deep fusion siamese network for automatic kinship verification (2020)
- Proposed a feature fusion method that uses discriminative features from backbone network and fuses the features to determine if two face images are with kinship relationship or not.
Supervised Contrastive Learning for Facial Kinship Recognition (2021)
- Adopted ArcFace as the backbone model pre-trained on MS-Celeb-1M to obtain more representative features. Moreover, they used a supervised contrastive loss function to contrast samples against each other and a hyperparameter ($\tau$, temperature) to focus on hard samples, thus enhancing the ability to distinguish the kinship relation.
- SOTA 2021.
Kinship representation learning with face componential relation (2023)
- Successfully enhance the accuracy of kinship verification task by leveraging attention mechanism. They combined attention mechanism with backbone to focus on the most discriminative part (e.g., five senses) of facial image. They also proposed a new loss function that combined contrastive loss and the attention map they created from the attention mechanism.

Bias Mitigation

Innovations to tackle racial bias with AI systems happen in developing new algorithms (how so?), new model architectures, or novel loss functions.
- Also, we have adversarial learning, adaptive layers, loss function modifications, and targeted bias reduction techniques.

Related Work

Adversarial learning with gradient reversal layer to learn fair features.
- Gradient reversal against discrimination (2018)
  - They devised fair features using an adversarial learning technique. This method involved the incorporation of a gradient reversal layer, effectively flipping the gradient of the classification head for sensitive attributes. This strategic move encouraged the model’s encoder to generate features devoid of sensitive information, thus reducing potential bias.
Adversarial learning to attain discriminative features while disentangling features into four crucial attributes
- Jointly debiasing face recognition and demographic attribute estimation (2020)
  - They leveraged adversarial learning to attain discriminative feature representation, simultaneously disentangling features into four distinct attributes. This process of disentanglement aimed to preserve crucial attributes while discarding unfair ones. By carefully manipulating the feature space, the model could successfully eliminate biases linked with sensitive attributes.
Adversarial learning to conceal information associated with fairness-related attributes (e.g. race, skin color, gender, age, etc.) by input perturbation
- Fairness-aware adversarial perturbation towards bias mitigation for deployed deep models (2022)
  - They introduced an approach with the aim of mitigating bias in deployed models. Unlike previous state-of-the-art methods that focused on altering the deployed models, they took a different route by concentrating on perturbing inputs. They employed a discriminator trained to differentiate fairness-related attributes from latent representations within the deployed models. Simultaneously, an adversarially trained generator worked to deceive the discriminator, ultimately generating perturbations that can conceal the information associated with protected attributes.
Adversarial learning with adaptive layers to enhance representation robustness for different demographic groups
- Mitigating face recognition bias via group adaptive classifier
  - In addition to the use of adversarial learning, they proposed the incorporation of adaptive layers within the model structure. The introduced adaptive layer aimed to enhance representation robustness for different demographic groups. An automation module was integrated to determine the optimal usage of adaptive layers in various model layers, dynamically adjusting the network’s behavior to cater to the unique requirements of different groups.
Softmax loss function with instance False Positive Rate
- Consistent instance false positive improves fairness in face recognition
  - Another approach involved the modification of the softmax loss function with a novel penalty term to mitigate bias while concurrently improving accuracy. They achieved this by utilizing instance False Positive Rate as a surrogate for demographic False Positive Rate, eliminating the need for explicit demographic group labels.
  - Could this strategy be used for other biases, like gender and age, where we do not necessarily have ground-truth labels?
    - In reality, demographic groups are any subset of the population defined by characteristics of age, gender, race, etc.
A novel loss function combining CosFace with bias difference to minimize identity bias
- MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition (2022)
  - They shifted their focus from demographic group bias to identity bias. They combined the CosFace [36] with bias difference to create a novel loss function. Their belief was that by targeting identity bias, they could solve the problem of skewed outcomes and treated all individuals impartially, striving for a comprehensive fairness that not dividing people based on their races. This innovative approach minimized identity bias without requiring sensitive attribute labels, thereby effectively enhancing fairness between demographic groups.
The authors, then, propose to integrate fairness and accuracy, aiming to improve both aspects. They do so by using adversarial learning with a fairness-aware loss function in a multi-task model structure with an attention mechanism.

Dataset Construction

KinRace is composed of six datasets: CornellKin, UBKinFace, KinFaceW-I, KinFaceW-II, Family101, and FIW.
They use only the main kinship types: FS, FD, MS, MD.
They limit the total number of images for each identity to at most 30.
They label each sample manually with four faces: African, Asian, Caucasian, and Indian.
To mitigate the other-race effect, they use three different racial annotators. The ground truth is determined by the majority. If there is no majority, the identity is not used.
KinFace racial distribution follows BUPT-Globalface, which is approximately the same as the real distribution.

Mixed-race positive pairs are removed.
They created KinRace because of the absence of race labels in kinship datasets. Also, they use four races to enable studies on the same benchmark.
They manage to reduce race bias, but identity bias still exists, albeit limiting it to 30 images per person.
Data quality alone doesn't significantly improve results, but being crucial to face verification, the authors plan to explore it in future works.

Proposed Method

They aim to mitigate racial bias while improving accuracy. They introduce the model, explain why it can improve accuracy, analyze the proposed fair contrastive loss function, and finally explain why they are effective.

Model Structure

Certain facial features used to determine kinship might be closely linked with racial characteristics. When these racial characteristics are deliberately obscured to avoid bias, the model may lose some of the information that was helping it accurately verify kinship.

Loss Function

The authors to build on top of two previous work: supervised contrastive loss by Supervised Contrastive Learning for Facial Kinship Recognition (2021) and a loss with a debias term by MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition (2022).
They propose the fairness-aware contrastive loss function: $L_{\text{fairness}} = -\log \frac{e^{\left(\cos(x_i,y_i) - bi\right)/\tau}}{\sum{j\neq i}^N \left[e^{\cos(x_i,x_j)/\tau} + e^{\cos(x_i,y_j)/\tau}\right] + e^{\left(\cos(x_i,y_i)-b_i\right)/\tau}}$ where $b_i$ is the averaging $\epsilon$ in $\cos(M(f_m), M(f_i))^2 - \cos(M(f_m), M(fj))^2 = \epsilon$ for the batch. They use the cross-entropy loss to train the race classifier: $L{\text{race}} = - \sum_{i=1}^n t_i \log(pi)$. The total loss is $L{\text{total}} = L{\text{fairness}} + L{\text{race}}$.
- $(x_i, y_i) $ are positive pairs, while $(x_i,x_j)$ and $(x_i, yj){(j \neq i)}$ are negative pairs. See Zhang et al. - 2021 - Supervised Contrastive Learning for Facial Kinship
- $f_m = \frac{1}{2}(f_i + f_j)$ and $M(.) $ is the debias layer. If $\epsilon > 0$ then $i$ has a large (face recognition) bias than $j$. See MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition (2022).

Questions

MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition (2022) proposed loss function aims to reduce identity bias. How race is included in $L_{\text{fairness}}$ then?
- In the original paper, the authors defined identity bias as the performance variance between "each identity". How can we understand them in the context of kinship verification? The performance variance between "each kinship"? The debias layer receives both feature vectors; they represent a positive or negative pair.
  - They aim to solve the identity bias by reducing the feature discriminability differences.
- Further in the paper, section 4.2, they explain identity biases as those "introduced by their races, genders, or other individual differences".

Gradients of Fair Contrastive Loss Function

They build upon Understanding the behaviour of contrastive loss (2021) to validated the idea that positive bias $bi$ means a stronger learning signal ($P{i,j}$ is larger) for positive and negative pairs.
- $\frac{\partial L(x_i)}{\partial \cos(x_i, xi)} = -\frac{1}{\tau} \sum{k \neq i} P_{i,k}, \quad \frac{\partial L(x_i)}{\partial \cos(x_i, xj)} = \frac{1}{\tau} P{i,j} $, gradients concerning positive and different negative samples, respectively. $P_{i,j}$ is the probability of $x_i$ and $x_j$ being recognized as positive pair.
  - $P_{i,j} = \frac{e^{\left(cos(x_i, xj)\right) / \tau}}{\sum{k \neq i} e^{\left(cos(x_i, x_k)\right) / \tau} + e^{\left(cos(x_i, x_i) - b_i\right) / \tau}}$ ‒ they added $b_i$ to the positive pair.
- They show the change only in $P_{i,j}$, but I think they also add $bi$ to $P{i,k}$. Otherwise, it doesn't make sense because the former relates to different negative samples.

Fairness Mechanism

This work employs two methods for improving fairness: adversarial learning and fair loss function. We use a race classifier in adversarial learning to remove racial information from feature vectors, which decreases standard deviation.

They note that adversarial training with a small dataset is not so effective. That's the reason they proposed the fairness-aware loss function.
Both methods decrease accuracy performance standard deviation across races while improving accuracy.

Experiment

Experimental Setting

Dataset

No overlapping families between train, validation, and test sets; race ratios similar to Table 1; four kinship relations (FS, FD, MS, MD); images resized to 112x112 using MTCNN.

Implementation Details

ArcFace model; feature maps and feature vector of size $\mathbb{R}^{7\times7\times512}$ and $\mathbb{R}^{512}$, respectively.
$\tau = 0.08$ (follows Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning).
SGD with momentum = 0.9 and weight decay = 1e-4.
10 epochs with 60000 steps; batch size = 25.
Baseline as the SOTA2021 (Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning).

In the experiment below, if we mention adversarial, it means we reverse the gradient of race classification like the red line in the indication in Figure 2. If we mention multi-task, it means we do not reverse the gradient of race classification, instead we just train the model normally with the green line in the indication in Figure 2.

Ablation Study

Effects of improving accuracy

With the attention mechanism, CBAM, and the race classification branch, the model learns features containing racial information, which adds a 7% improvement in accuracy over the baseline. The debias layer also slightly improves accuracy and standard deviation by rectifying bias and making the compactness degree more uniform.

Effects of improving fairness

Both fairness strategy (gradient reversal and debias layer) mitigate bias (reduces standard deviation), but also harms accuracy. By merging both strategies they remarkably reduce standard deviation while boosting the accuracy.
- Firstly, the feature vector excludes racial information, which benefits from adversarial learning. Secondly, the debias layer becomes more robust because it can generate debias term depending on the most essential facial features while racial traits are removed.
Their overall strategy enhances fairness while maintaining accuracy.

Questions

How can the debias layer generate a debias term if the feature vector has no racial information?
- Maybe it is because the feature maps are used. They are passed to the attention module, which attends to relevant features to generate more sophisticated features. These features may well contain biases.

Comparison with SOTA methods

They compare their method with three other works (Achieving Better Kinship Recognition Through Better Baseline, Deep fusion siamese network for automatic kinship verification (2020), Supervised Contrastive Learning for Facial Kinship Recognition (2021)) that performed well in the RFIW challenge.
- They also re-implement these methods using the original training settings and evaluate them on the KinRace dataset. What could guarantee that these re-implementations are the best ones for the KinRace dataset?
Evaluate the generalization of the method on other datasets: UB KinFace and FIW.

Standard deviation on the KinRace testing set every 10000 iteration on SOTA methods and the proposed KFC.

They note that their results are competitive with Kinship representation learning with face componential relation (2023) and claim better results because of the use of the ArcFace backbone.
- Why?

Visualization and Analysis on Fairness

They analyze the intra-class and inter-class angle between baseline (Supervised Contrastive Learning for Facial Kinship Recognition (2021)) and their proposed method to prove that they improved fairness.

They also show the their feature embeddings are more evenly distributed; clear boundaries between the races were removed, which presents a more fair solution for kinship verification. To evenly distribute the embeddings means to remove clear boundaries between races. This implies that kinship verification has less race bias.

Conclusion

They simultaneously aimed to mitigate racial bias while improve accuracy in kinship verification. They used adversarial learning with a fairness-aware loss function in a multi-task model with an attention module. They also provide KinFace, a kinship dataset with racial labels. Their results suggests that their proposed method significantly improve racial fairness and accuracy for kinship verification by automatically adjusting intra- and inter-class angles in feature space.

General Summary

The paper titled "KFC: Kinship Verification with Fair Contrastive Loss and Multi-Task Learning" by Jia Luo Peng, Keng Wei Chang, and Shang-Hong Lai, addresses the challenge of kinship verification in the presence of biases associated with gender, ethnicity, and age due to the lack of large-scale, diverse datasets. The authors propose a comprehensive solution involving a multi-task learning architecture with an attention module and introduce a fairness-aware contrastive loss function that incorporates a debiasing term with adversarial learning. The approach is evaluated on a newly constructed dataset named KinRace, designed to be robust against race-related biases.

Insights

The model's architecture is adept at counteracting biases while improving kinship verification performance. By combining gradient reversal and a fairness-aware contrastive loss function, the model can mitigate racial biases effectively without compromising the accuracy.
The attention module in the multi-task architecture concentrates on the relevant facial features, allowing for discrimination of kin relationships without racial information influencing the decision-making process.
The novel loss function proposed rightly extends previous work on supervised contrastive loss and debiasing terms, addressing both the accuracy and fairness in kinship verification, which had previously been handled separately.
The dataset KinRace has been carefully curated to represent different races evenly and excludes mixed-race pairs to ensure clarity in racial categories. This attention to detail underlines the importance of dataset quality in machine learning tasks, especially those sensitive to biases.

Further Questions to Research

In relation to the KinRace dataset, further research could focus on including mixed-racial pairs and how the model would perform in kinship verification in more complex, diverse familial backgrounds.
Investigate how the debias layer functions when racial information has been extracted. Can the model still effectively generate debias terms based on non-racially discriminative features?
Re-evaluating the SOTA approaches on the KinRace dataset opens a question about the adaptability of models to new datasets with varied distributions. Future research could investigate optimal re-implementation guidelines for fair assessment when applying existing methods to new datasets.
It may be worth exploring the application of the proposed fairness-aware loss function and adversarial learning techniques to other domains where fairness is critical, such as credit scoring or predictive policing, to see if similar reductions in bias can be achieved.
Since the authors highlight the potential limitations of their method when employed on small datasets, it would be valuable to explore strategies that can enhance the performance and fairness in limited-data scenarios.
The reduction of race bias in models poses the question of whether similar mechanisms could be designed to mitigate other forms of biases, like age or gender biases, in datasets where corresponding labels might be unavailable or unreliable.

This research presents pivotal advancements in kinship verification accuracy and racial fairness, paving the way for more inclusive and ethically conscious AI models in facial recognition technologies.

vitalwarley commented 10 months ago

The reduction of race bias in models poses the question of whether similar mechanisms could be designed to mitigate other forms of biases, like age or gender biases, in datasets where corresponding labels might be unavailable or unreliable.

Essa questão, bem como o conteúdo anterior, foi gerado pelo GPT4 usando as minhas anotações. É bem pertinente ao que já estamos fazendo.

vitalwarley commented 10 months ago

Esse paper foi bem complexo. Foram cerca de 12h estudando seu conteúdo e às vezes conceitos ou paper citados. Preciso ser mais eficiente nos demais.

vitalwarley commented 10 months ago

Em grande parte, esse trabalho foi uma combinação dos seguintes trabalhos abaixo

Attention module inspired by Kinship representation learning with face componential relation (2023).
- They also used Cbam: Convolutional block attention module (2018)
Contrastive loss inspired by Supervised Contrastive Learning for Facial Kinship Recognition (2021)
- I think they build mostly upon this work -- network structure and hyperparameters.
Debias term inspired by MixFairFace: Towards Ultimate Fairness via MixFair Adapter in Face Recognition (2022)
Gradient Reversal inspired by Gradient reversal against discrimination (2018)

Penso que nossos próximos passos devem ser com essa questão em mente. Nesse sentido, que trabalhos existem que foquem na remoção de viéses de gênero e idade? #41 foi um; há também #34.

vitalwarley commented 10 months ago

Contrastive loss inspired by Supervised Contrastive Learning for Facial Kinship Recognition (2021)

I think they build mostly upon this work -- network structure and hyperparameters.

Confirmo. O código deles foi adaptado do #26. Também citam explicitamente.

vitalwarley / research

KFC: Kinship Verification with Fair Contrastive Loss and Multi-Task Learning #50

Abstract

Introduction

Previous works on fairness in face recognition and face verification

Proposal

Objective

Problem 1: fairness and small datasets

Problem 2: boost (kinship verification?) accuracy and fairness simultaneously

Problem 3: improve kinship verification accuracy

Problem 4: fairness (in general?)

Schematic

Contributions

Related Work

Kinship Verification

Bias Mitigation

Related Work

Dataset Construction

Proposed Method

Model Structure

Loss Function

Questions

Gradients of Fair Contrastive Loss Function

Fairness Mechanism

Experiment

Experimental Setting

Dataset

Implementation Details

Ablation Study

Effects of improving accuracy

Effects of improving fairness

Questions

Comparison with SOTA methods

Visualization and Analysis on Fairness

Conclusion

General Summary

Insights

Further Questions to Research