vitalwarley / research

2 stars 0 forks source link

A robust kinship verification scheme using face age transformation #59

Open vitalwarley opened 6 months ago

matheuslevi11 commented 2 months ago

Overview

This paper, published in 2023, aims to solidify and continue the author’s work, that started in “A Cross-age Kinship Verification Scheme Using Face Age Transfer Model, 2022 IEEE”. They present a kinship verification scheme that involves generating face images from various age groups to make more robust training in order to achieve better kinship accuracy. They do so by basically generating this images, extracting unique facial features that do not change even with age changes from it, and train the kinship verification model using these features. The main difference between the two papers is that in the first one, they use HRFAE as the age transformation model, and in this paper, they create their own.

image

Face Age Transformation Model

There are four main components in the face transformation model: Age encoder, Age classifier, Generator and Descriminator. The age encoder is a simple FC layer with sigmoid. The age classifier is a pretrained VGG-16 network architecture finetuned on IMDB-WIKI dataset. The generator itself has an encoder and a decoder: the enconder which contains three convolution layers and four residual blocks, extracts the encoded feature map from the image. The decoder network, which contains two nearest-neighbor up-sampling layers and three convolution layers, decodes the encoded feature map back into the facial image of the target age. The discriminator consists of six convolution layers, with batch normalization and LeakyReLU, except in the first convolution layer.

image

Loss functions

Kinship Verification Model

This model consists of two networks: Facial feature extractor that extracts facial features from the generated image groups and a classifier that determines whether they are kin or not. For feature extraction, the authors finetune the Inception Resnet model using FIW. In the training process, triplet loss is used to ensure that kin-related images are closer together in the embedding space. Authors also report that in training, they use hard samples to increase learning efficiency, which means the hard samples in the positive set are far from the anchor, whereas those in the negative set are near the anchor.

As for classification, they construct a network with convolution and residual blocks. Based on these blocks, the fully connected and softmax function layers perform kinship classification and calculate the cross-entropy loss.

Experiments and Results

To train and evaluate the age transformation model, the authors train it on Cross-Age Celebrity Dataset (CACD) which contains 163,446 face images of 2000 celebrities with their respective age. For the kinship verification, the model was trained on FIW, as described above, and evaluated on KinFaceW-I and KinFaceW-II using 5-fold cross validation.

image

Testing the impact of the strategy proposed, it is clear that aging effect created by training with face age transformed images is relevant, even if the overall result is not SOTA ( this one belongs to #68 ).

image

Looking at the ROC curves, all kinship types show improvement with the use of aging, specially Father-Son, which indicates that this method is probably effective on diminishing the effects of age in kinship verification and might be useful for building age-invariant kinship models.

vitalwarley commented 2 months ago

Bom resumo, @matheuslevi11. Curioso para saber mais detalhes sobre o modelo. Muito bom que é recente. Há código?

Estranho usarem backbones antigos, como VGG-16, sendo que os últimos trabalhos de kinship usam ArcFace ou AdaFace. Outra coisa "chata" é não haver avaliação dos resultados no próprio FIW... Enquanto isso, no KFW há resultados claros, mas não chegam a ser SOTA.

Fico pensando no quanto a perda contrastiva ajudaria aqui. Além disso, o que um sampler adaptado, similar ao proposto em #80, traria de resultados. Nos meus experimentos, ainda a serem registrados aqui, o uso do sampler melhorou significativamente o baseline. E se fosse adaptado para considerar também a idade? Fica a ideia para avaliarmos em breve.

matheuslevi11 commented 1 month ago

Infelizmente não há código disponível. Apesar de no artigo conter informações detalhadas sobre a arquitetura dos modelos e detalhes de implementação como batch size e learning rate, não sei estimar com precisão se seria fácil reproduzi-lo. Também gostaria de ter visto resultados no FIW, acredito que seria interessante.

Com relação a perda contrastiva, eu fiquei pensando nisso também, já que eles utilizam triplet loss e tem bons resultados. Quanto aos outros questionamentos, eu acho a ideia promissora, é a direção que estou indo no TCC atualmente, na sexta podemos falar mais sobre isso.

vitalwarley commented 1 month ago

Realizei umas pesquisas no assunto e encontrei um paper interessante e recente, do começo do mês, Synthetic Face Ageing: Evaluation, Analysis and Facilitation of Age-Robust Facial Recognition Algorithms.

The paper "Synthetic Face Ageing: Evaluation, Analysis and Facilitation of Age-Robust Facial Recognition Algorithms" by Wang Yao et al. focuses on enhancing age-invariant face recognition (AIFR) systems using synthetic ageing data. Key points include:

  1. Objective: Improve AIFR systems by utilizing synthetic ageing data to address the challenges posed by ageing.
  2. Methodology: Evaluates synthetic ageing methods (SAM, CUSP, AgeTransGAN) and their impact on face recognition performance.
  3. Results: Models trained on synthetic ageing data show a 3.33% improvement in recognition rates, especially with a 40-year age gap.
  4. Conclusion: Synthetic ageing data enhances AIFR systems, but synthetic images still have limitations compared to real images.

Pertinente também a existência de outro dataset, que não foi usado pelo paper dessa issue, o B3FD:

Biometrically Filtered Famous Figure Dataset [30] or B3FD has merged two large web-based datasets, including CACD [31] and IMDB-WIKI [32]. IMDB-WIKI dataset is the largest publicly available dataset of face images with gender and age labels,1 which is collected directly from open internet sources. This suggests that the IMDB-WIKI dataset contains a significant amount of mislabeled data, despite its widespread utilization for training purposes. Considering that web-scraping approaches for automatic data collection can produce large amounts of weakly labelled and noisy data, B3FD is focused on cleaning the web-scraped facial datasets by automatically removing erroneous samples that impair their usability. B3FD has 375,592 images with 53,759 unique subjects. The age labels are ranging from 0 to 100. It comprises two subsets including B3FD-IWS and IMDB-WIKI. B3FD-IWS consists of 245,204 images from the IMDB-WIKI dataset with 53,568 unique subjects. B3FD-CS consists of 130,388 processed samples from the CACD dataset with 1,831 unique subjects. Since the data volume of this dataset is large enough, this dataset has become the baseline to explore the effect of real-world age intervals on face recognition.