happynear / FaceVerification

An Experimental Implementation of Face Verification, 96.8% on LFW.
447 stars 234 forks source link

How to finetune card-face dataset #65

Open ddxu opened 7 years ago

ddxu commented 7 years ago

@happynear Admire your genius ! It seems that you are very experienced in face identification/verification problems. My problem is that: how to finetune those excellent models on card-face dataset, which has many identities (>100 thousands) while each identity has only two images (one is card face and the other is camera face). Obviously, my task is a verification task.

If finetuning such card-face dataset using softmax loss (or center loss, or their variants) which optimized as a classification problem, I guess the last inner product layer will have a large output number ( equal to total number of identities), it will hard to learn for the network , for the reason of the fact that the number of training samples are just double of the number of identities. Even your latest work "NormFace" is also seems to be learning in classification-style.

Now my finetune work is almost based on triplet-loss. It works on some issues but still have some problems. I want to try some new methods, but don't know which way to try.

Can you give me any suggestions? Thank you~

happynear commented 7 years ago

You may refer to SphereFace paper's Figure 1 to learn about what's the difference between classification and metric learning. Then you can read SphereFace's theory part and NormFace's metric learning part to understand under which condition, we can use classification loss functions to do metric learning tasks.

To use triplet loss, you need to implement a hard sampling algorithm to avoid the zero gradient problem. It is difficult and tricky. Now the academic trend is to modify classification loss functions for metric learning tasks.

ddxu commented 7 years ago

@happynear Thank you for your suggestions. I will read these papers carefully and experiment their code.

xuguozhi commented 6 years ago

But "each identity has only two images",should we consider few-shot learning?