pterhoer / FaceImageQuality

Code and information for face image quality assessment with SER-FIQ
534 stars 90 forks source link

Plots in the Paper #46

Closed praneet195 closed 2 years ago

praneet195 commented 2 years ago

Hi,

To generate the plots in the paper for LFW, do you drop pairs from LFW's pairs.txt? If this is the case, then a method that drops more pairs will end up having the best FNMR vs Ratio of Unconsidered Images Plot and would hence be better. How do you ensure the same number of pairs are evaluated even if the different images are dropped for each IQA?

For example, I was comparing your methods vs BRISQUE. In my experiments, BRISQUE dropped images that reduced the total number of evaluation pairs in LFW's pairs.txt when compared to your method. I feel this is why, in my experiments, BRIQUE outperforms SERFIQ.

If possible can you please explain the evaluation procedure in detail

praneet195 commented 2 years ago

I'm adding my result plots here. I used the BRISQUE implementation from this repo: https://github.com/RyanXingQL/Image-Quality-Assessment-Toolbox. This is nothing more than a python wrapper for the MATLAB implementation. qe_eval lfw_temps_iqa

pterhoer commented 2 years ago

Hi Praneet,

as you correctly stated the number of (genuine and imposter) pairs have a significant influence on the verification performance. This is why we did not use the LFW benchmark with only 6k image pairs but instead we compute all image combinations in LFW for the experiments. There is a closed issue where the evaluation on LFW is discussed in more details.

Best Philipp

praneet195 commented 2 years ago

Thank you for the reply. I'll look into it.

praneet195 commented 2 years ago

Hi,

So I basically created an exhaustive evaluation of LFW. I compared embeddings of every image to every other image in LFW (175099056 image pairs). The only difference between my code and the evaluation is that instead of using MTCNN i used RetinaFace as the detector. The alignment and ArcFace models used are the same. These are the results I obtained. I'm not sure why BRISQUE still outperforms SER-FIQ. qe_eval_exhaust

pterhoer commented 2 years ago

Hi Praneet,

even at 0% ratio of unconsidered images (the full LFW database without any quality assessment) the performance is strongly different from our results or the results from MagFace. (0.007 vs 0.04 - FNMR@0.001FMR) So, the error/differences might happen before the quality assessment itself.

Best Philipp