Implemention of evaluation problem

zdwong commented 4 years ago

The FNMR is reported at 0.001 FMR by ratios of unconsidered images which determined by quality. I am not sure whether imposter pairs are filtered by quality when calculating FMR0.1%. For example, LFW datasets, it has genuine and imposter pairs. Genuine pairs of images are filtered according to ratios of unconsidered images to calculate FNMR further. But for imposter pairs of images, are they filtered by quality to calculate FMR0.1%?

pterhoer commented 4 years ago

Hi zdwong,

we calculated the FNMR at 0.001 FMR on the (remaining) images of the highest quality predictions. For instance, at a ration of unconsidered images of 0.2, the 20% of the lowest quality images are neglected and the recognition performance is computed on the remaining 80%. Please note that we calculated the recognition performance by creating and considering all possible genuine and imposter pairs, not only the predefined pairs of the test-set of LFW.

Best Philipp

zdwong commented 4 years ago

@pterhoer thanks for your reply. But I still can not fully understand " Please note that we calculated the recognition performance by creating and considering all possible genuine and imposter pairs, not only the predefined pairs of the test-set of LFW.“ For example, on LFW datasets, it has 3000 genuine pairs and 3000 imposter pairs. I calculated the FMR0.1% by considering imposter pairs while calculated its corresponding FNMR by using genuine pairs with the threshold of FMR0.1%. So the question is how to create and consider all possible genuine and imposter pairs?

Best

pterhoer commented 4 years ago

Hi zdwong,

the test-part of LFW consists of 6k predefined sample pairs. In our work, we did the experiments on the whole LFW database taking into account all 13k images. Therefore, we filtered low-quality images based on the demanded quality-threshold and on the remaining images, we created all possible pairs. For instance, if we want to have a ratio of unconsidered images of around 0.2, we will get around 10k images that we can use. With these we create all possible pairs, leading to 10,000*9,999 pairs for the evaluation. Please note that in this case the number of imposter is significantly higher than the number of genuine pairs. Therefore, the accuracy metric is not suitable here. However, FNMR@FMR and ROC curves work well.

Best Philipp

zdwong commented 4 years ago

@pterhoer thanks Philipp！

fffanxt commented 4 years ago

@zdwong hi! I am trying to reproduce the result(error vs reject curve for LFW - ArcFace). I got a curve that the FNMR is 100 times smaller than the one in the paper. I wonder if got the right curve like the one in the paper?

pterhoer commented 4 years ago

Hi fffanxt,

the results in the paper are represented in terms of the "total rate" meaning that the FNMR@0.001 FMR = FNMR@0.1%FMR. Perhaps, you just plotted your results in %. In this case, you successfully reproduces the results :)

Best Philipp

XJX777 commented 2 years ago

Hi zdwong,

the test-part of LFW consists of 6k predefined sample pairs. In our work, we did the experiments on the whole LFW database taking into account all 13k images. Therefore, we filtered low-quality images based on the demanded quality-threshold and on the remaining images, we created all possible pairs. For instance, if we want to have a ratio of unconsidered images of around 0.2, we will get around 10k images that we can use. With these we create all possible pairs, leading to 10,000*9,999 pairs for the evaluation. Please note that in this case the number of imposter is significantly higher than the number of genuine pairs. Therefore, the accuracy metric is not suitable here. However, FNMR@FMR and ROC curves work well.

Best Philipp

Hi @pterhoer , thanks for your excellent job! Is there any open source code for the test method you mentioned here or any recommended reference code? I didn't find the appropriate code in any other open source project.

pterhoer commented 2 years ago

Hi zdwong, the test-part of LFW consists of 6k predefined sample pairs. In our work, we did the experiments on the whole LFW database taking into account all 13k images. Therefore, we filtered low-quality images based on the demanded quality-threshold and on the remaining images, we created all possible pairs. For instance, if we want to have a ratio of unconsidered images of around 0.2, we will get around 10k images that we can use. With these we create all possible pairs, leading to 10,000*9,999 pairs for the evaluation. Please note that in this case the number of imposter is significantly higher than the number of genuine pairs. Therefore, the accuracy metric is not suitable here. However, FNMR@FMR and ROC curves work well. Best Philipp

Hi @pterhoer , thanks for your excellent job! Is there any open source code for the test method you mentioned here or any recommended reference code? I didn't find the appropriate code in any other open source project.

Hi XJX777,

thank you for your feedback. This might be helpful for you: https://github.com/IrvingMeng/MagFace/tree/main/eval/eval_quality. I don't know anyone who tested this code but the results in the paper looks correct.

Otherwise, writing the evaluation code is not so hard.

Predict the quality values for your samples.
Sort them.
Remove x% of the lowest quality samples.
Compute the face recognition error (e.g. with sklearn.metrics.roc_curve https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html)
Repeat with a higher x.

Best Philipp

pterhoer / FaceImageQuality

Implemention of evaluation problem #17